I’ll skip the “we’re 10× faster” marketing and just show the numbers, the methodology, and the cases where the gap doesn’t matter.
TL;DR
| Metric | LangGraph | Agentmatic | Ratio |
|---|---|---|---|
| Graph traversal (ops/ms) | 8.4 | 100.0 | 11.9× |
| Channel throughput (msgs/ms) | 1.3 | 100.0 | 76.9× |
| Memory (MB, 1000-node graph) | 184 | 71 | 0.39× (61% less) |
| Cold start (ms) | 410 | 90 | 0.22× (78% faster) |
| ReAct p95 (LLM mocked @ 50 ms) | 285 ms | 62 ms | 4.6× |
| Supervisor (1+4) p95 | 1.42 s | 0.18 s | 7.9× |
| RAG (3-node + vector lookup) p95 | 320 ms | 71 ms | 4.5× |
| Map fan-out (1 → 100) p95 | 8.4 s | 0.71 s | 11.8× |
Methodology
- Hardware: AWS c7i.4xlarge (16 vCPU, 32 GB), Ubuntu 24.04. Numbers below are from this. We also ran on Apple M3 Max + 64GB; Mac numbers are within 8% of the c7i.
- Python: 3.12.5, uvloop event loop.
- Versions: LangGraph 0.2.40 (latest stable at time of writing), Agentmatic 0.1.0.
- LLM: mocked at a fixed 50 ms latency. This lets us measure framework overhead, not OpenAI’s API.
- Iterations: 1,000 runs per configuration, p50 / p95 / p99 reported. Warmup of 50 runs discarded.
- Source:
agentmatic/benchmarks/bench.pyin the repo. Runpython bench.py --allto reproduce.
Why the framework overhead matters
In a single-LLM-call agent, the LLM accounts for >95% of wall time. Framework speed barely matters.
In a multi-agent supervisor with retry and HITL — 30 LLM calls, 100+ graph transitions, 5+ checkpoint writes — the framework accounts for 30–60% of wall time. There, framework speed dominates.
This is why the Supervisor and Map fan-out numbers are the most dramatic: those workloads stress the runtime harder than the LLM.
Where the speed comes from
Three sources:
1. Lock-free Rust scheduler. The graph executor is a work-stealing Tokio runtime. Node execution doesn’t queue through Python’s event loop; it queues in Rust. The Python GIL only matters at the tool call boundary.
2. Pregel-style supersteps. Messages between nodes batch at each barrier. Fewer FFI crossings; better cache locality.
3. Zero-copy state. State snapshots are Arc<Frame>. Cloning is cheap (pointer copy). Mutation is copy-on-write. LangGraph deep-copies on each step.
You can verify all three with py-spy --rate 5000 against either framework. LangGraph spends ~40% in copy.deepcopy and asyncio machinery. Agentmatic spends ~3% in PyO3 boundary crossings.
Where the gap is small
Single-LLM-call ReAct. One LLM call, no retry, ~3 graph nodes. We measured 1.18× — barely worth mentioning. Use the framework you prefer.
Long-context RAG. When 95% of latency is the LLM processing 100k tokens, framework overhead is irrelevant. ~1.05× win. Pick by feature set.
Tool-bound agents with slow tools. If your dominant tool is a database query that takes 800 ms, the framework’s 50 ms vs 250 ms barely matters.
Where the gap is large
Multi-agent supervisors. 30+ graph transitions per turn. The c7i numbers show 7.9× on p95.
Map fan-out. Parallel evaluation of N branches. The Rust scheduler is dramatically better at this — 11.8× on p95 for 1 → 100 fan-out.
Retry loops. Each retry is a graph re-entry. High graph density = high framework cost.
Streaming agents. The astream event loop in LangGraph is event-driven Python; ours is a Tokio channel. ~6× p95 in our streaming benchmark.
What about memory
| Configuration | LangGraph | Agentmatic |
|---|---|---|
| 100-node graph, idle | 64 MB | 28 MB |
| 1,000-node graph, idle | 184 MB | 71 MB |
| 1,000 concurrent sessions | 4.2 GB | 1.1 GB |
For high-concurrency workloads (a single process serving many simultaneous agent runs), the memory savings translate to fewer pods, lower infra cost.
Cold start
| LangGraph | Agentmatic | |
|---|---|---|
| Import time | 340 ms | 65 ms |
| First-graph compile | 70 ms | 25 ms |
| Total time-to-first-token | 410 ms | 90 ms |
This matters for serverless deploys (Lambda, Cloud Run). 320 ms saved per cold-start at scale adds up.
Reproducibility
git clone https://github.com/neul-labs/agentmatic
cd agentmatic/benchmarks
# Both frameworks installed in the same venv
pip install agentmatic langgraph
python bench.py --all --output bench.json
python bench.py --report bench.json
The harness prints a table like the TL;DR above for whatever hardware you’re on.
Caveat
These numbers measure framework overhead. They don’t measure prompt quality, tool design, or product fit. Switching frameworks won’t save a bad agent. But if your agent is good and your bottleneck is the runtime, this is the cleanest single-step win available.