Performance

Numbers, not claims.

We ran Agentmatic against LangGraph on the same graphs, same LLM mocks, same hardware. Here's what came out.

Summary. Across ReAct, Supervisor, RAG, and Map fan-out workloads at 100–10,000 nodes:

10–15× faster graph traversal
70–80× faster channel message throughput
50–80% lower memory footprint
78% faster cold start (Python SDK includes the prebuilt PyO3 wheel)

Graph traversalops/ms

agentmatic

100.0

LangGraph

8.0

Channel throughputmsgs/ms

agentmatic

100.0

LangGraph

1.3

Memory footprint% of LangGraph

agentmatic

38.0

LangGraph

100.0

Cold start% of LangGraph

agentmatic

22.0

LangGraph

100.0

Numbers from the internal benchmark suite (graphs ranging 100 nodes → 10k nodes, mixed ReAct + Supervisor workloads). Higher is better for the first two; lower is better for the last two.

Methodology

Hardware — Apple M3 Max + 64GB and AWS c7i.4xlarge (Linux x86_64) — numbers reported for the AWS box.
LLM — mocked at fixed 50 ms latency so we measure the framework overhead, not the model.
Graphs — synthetic ReAct (4 nodes), Supervisor (1+4 specialists), RAG (3 nodes + vector lookup), and Map fan-out (1→100).
Iterations — 1,000 runs per configuration, p50 / p95 / p99 reported in the repo (see /benchmarks-data in the agentmatic GitHub).
LangGraph version — 0.2.40 (latest stable at the time of writing).

Why the gap

Three sources, in order of contribution:

Rust scheduler. Lock-free SPSC channels, work-stealing scheduler, zero-copy state diffs. The Python GIL is gone for the runtime; only the actual tool call enters Python.
Pregel-style supersteps. Channel-message batching across each barrier — fewer trips through the FFI boundary.
Memory layout. State snapshots are Arc<Frame> with copy-on-write semantics. LangGraph deep-copies on each step.

What it doesn't measure

These benchmarks measure framework overhead. The dominant cost in any real agent is still the LLM call. Your end-to-end latency improvement depends on how many graph steps you take per LLM call; high-graph-density workloads (Supervisor patterns, Map fan-out, retry loops) see the biggest wins — typically 8–12× in production.

Reproducing the numbers

git clone https://github.com/neul-labs/agentmatic
cd agentmatic/benchmarks
python bench.py --all --output bench.json
python bench.py --report bench.json

Ship your next agent in minutes, not weeks.

MIT licensed. Drop-in for LangGraph. Native SDKs in 5 languages. Battle-tested resilience primitives in the box.

Migrate from LangGraph → Star on GitHub ★