I’ll skip the “we’re 10× faster” marketing and just show the numbers, the methodology, and the cases where the gap doesn’t matter.

TL;DR

MetricLangGraphAgentmaticRatio
Graph traversal (ops/ms)8.4100.011.9×
Channel throughput (msgs/ms)1.3100.076.9×
Memory (MB, 1000-node graph)184710.39× (61% less)
Cold start (ms)410900.22× (78% faster)
ReAct p95 (LLM mocked @ 50 ms)285 ms62 ms4.6×
Supervisor (1+4) p951.42 s0.18 s7.9×
RAG (3-node + vector lookup) p95320 ms71 ms4.5×
Map fan-out (1 → 100) p958.4 s0.71 s11.8×

Methodology

  • Hardware: AWS c7i.4xlarge (16 vCPU, 32 GB), Ubuntu 24.04. Numbers below are from this. We also ran on Apple M3 Max + 64GB; Mac numbers are within 8% of the c7i.
  • Python: 3.12.5, uvloop event loop.
  • Versions: LangGraph 0.2.40 (latest stable at time of writing), Agentmatic 0.1.0.
  • LLM: mocked at a fixed 50 ms latency. This lets us measure framework overhead, not OpenAI’s API.
  • Iterations: 1,000 runs per configuration, p50 / p95 / p99 reported. Warmup of 50 runs discarded.
  • Source: agentmatic/benchmarks/bench.py in the repo. Run python bench.py --all to reproduce.

Why the framework overhead matters

In a single-LLM-call agent, the LLM accounts for >95% of wall time. Framework speed barely matters.

In a multi-agent supervisor with retry and HITL — 30 LLM calls, 100+ graph transitions, 5+ checkpoint writes — the framework accounts for 30–60% of wall time. There, framework speed dominates.

This is why the Supervisor and Map fan-out numbers are the most dramatic: those workloads stress the runtime harder than the LLM.

Where the speed comes from

Three sources:

1. Lock-free Rust scheduler. The graph executor is a work-stealing Tokio runtime. Node execution doesn’t queue through Python’s event loop; it queues in Rust. The Python GIL only matters at the tool call boundary.

2. Pregel-style supersteps. Messages between nodes batch at each barrier. Fewer FFI crossings; better cache locality.

3. Zero-copy state. State snapshots are Arc<Frame>. Cloning is cheap (pointer copy). Mutation is copy-on-write. LangGraph deep-copies on each step.

You can verify all three with py-spy --rate 5000 against either framework. LangGraph spends ~40% in copy.deepcopy and asyncio machinery. Agentmatic spends ~3% in PyO3 boundary crossings.

Where the gap is small

Single-LLM-call ReAct. One LLM call, no retry, ~3 graph nodes. We measured 1.18× — barely worth mentioning. Use the framework you prefer.

Long-context RAG. When 95% of latency is the LLM processing 100k tokens, framework overhead is irrelevant. ~1.05× win. Pick by feature set.

Tool-bound agents with slow tools. If your dominant tool is a database query that takes 800 ms, the framework’s 50 ms vs 250 ms barely matters.

Where the gap is large

Multi-agent supervisors. 30+ graph transitions per turn. The c7i numbers show 7.9× on p95.

Map fan-out. Parallel evaluation of N branches. The Rust scheduler is dramatically better at this — 11.8× on p95 for 1 → 100 fan-out.

Retry loops. Each retry is a graph re-entry. High graph density = high framework cost.

Streaming agents. The astream event loop in LangGraph is event-driven Python; ours is a Tokio channel. ~6× p95 in our streaming benchmark.

What about memory

ConfigurationLangGraphAgentmatic
100-node graph, idle64 MB28 MB
1,000-node graph, idle184 MB71 MB
1,000 concurrent sessions4.2 GB1.1 GB

For high-concurrency workloads (a single process serving many simultaneous agent runs), the memory savings translate to fewer pods, lower infra cost.

Cold start

LangGraphAgentmatic
Import time340 ms65 ms
First-graph compile70 ms25 ms
Total time-to-first-token410 ms90 ms

This matters for serverless deploys (Lambda, Cloud Run). 320 ms saved per cold-start at scale adds up.

Reproducibility

git clone https://github.com/neul-labs/agentmatic
cd agentmatic/benchmarks

# Both frameworks installed in the same venv
pip install agentmatic langgraph

python bench.py --all --output bench.json
python bench.py --report bench.json

The harness prints a table like the TL;DR above for whatever hardware you’re on.

Caveat

These numbers measure framework overhead. They don’t measure prompt quality, tool design, or product fit. Switching frameworks won’t save a bad agent. But if your agent is good and your bottleneck is the runtime, this is the cleanest single-step win available.