pregel pattern
Doc pipeline — custom Pregel graph with map fan-out
A high-throughput document processing pipeline that fans out to N parallel workers, with no LLM. Shows raw Pregel API in Rust — extract, dedup, embed, persist.
Highlights
- Custom StateGraph with a Map fan-out node that processes 100s of docs in parallel.
- No LLM — proves the Pregel runtime is useful beyond agents.
- Tokio-async, channel-driven, runs at ~5,000 docs/sec on an M3 Max.
- Demonstrates SPSC channels, conditional edges, and Pregel supersteps.
What this shows
Not every graph needs an LLM. Sometimes you just want a typed, parallel, checkpointable pipeline. This example uses the raw Pregel API to build a document-processing pipeline that fans out to N workers, dedupes by content hash, embeds, and persists — without ever calling a model.
Architecture
docs (stream)
▼
extract ──┐
├──▶ map (1 → N parallel) ──▶ join ──▶ dedup ──▶ persist
stream ──┘
Why Pregel
Pregel (the bulk-synchronous parallel model) gives you:
- Deterministic parallelism — every superstep is a barrier; the next step sees all messages from the previous one.
- Free checkpointing — every superstep boundary is a checkpoint.
- Composable channels — typed message passing between nodes, with reducers for aggregation.
If you’ve ever wanted Apache Beam’s semantics in a single Rust binary, this is that.
Key snippet
use agentmatic::prelude::*;
let graph = StateGraph::<DocState>::new()
.add_node("extract", extract_node)
.add_node("map", map_node.with_parallelism(16))
.add_node("dedup", dedup_node)
.add_node("persist", persist_node)
.add_edge("extract", "map")
.add_edge("map", "dedup")
.add_edge("dedup", "persist")
.compile()?;
let result = graph.invoke(initial_state).await?;
Benchmarks
On a single M3 Max (12 cores):
| Metric | Value |
|---|---|
| Throughput | ~5,000 docs/sec |
| Memory | ~120 MB steady-state |
| Cold start | 42 ms |
What to read next
- Architecture — how the Pregel runtime works.
- Source on GitHub