Industrial manufacturing · 2,400 engineers
Distributed RAG across 12 worker nodes, all in their VPC — no SaaS dependency.
A Fortune 500 manufacturer needed RAG over 4M technical documents with strict data-residency requirements. They deployed Agentmatic's distributed runtime across 12 worker nodes in their VPC.
Internal RAG over 4M technical documents · Published May 17, 2026
Outcomes
- 12-node coordinator/worker cluster serving 6,800 queries/day across business units.
- Zero data-residency incidents — no documents or queries leave their VPC.
- Median query latency 1.8 s; p95 4.2 s including retrieval + reranking + generation.
- Replaced a SaaS RAG vendor at one-fifth the cost.
The setup
A Fortune 500 manufacturer with 4 million technical documents (CAD specs, maintenance logs, compliance records) wanted internal RAG. Constraints:
- Data residency — documents and queries cannot leave their VPC. SaaS RAG vendors were ruled out.
- Concurrent users — engineers and maintenance staff across 14 sites.
- Multi-business-unit — each BU has its own corpus and access control.
- Cost ceiling — annual budget capped at $400k including infra.
What they built
A coordinator/worker cluster on their AWS GovCloud VPC:
- 3 coordinator pods behind an internal ALB.
- 12 worker nodes (m7i.4xlarge) each running
agentmatic-worker. - Qdrant cluster for the vector store (also in-VPC).
- PostgresSaver for checkpoints (RDS in-VPC).
- OpenTelemetry → Datadog for tracing (existing pipeline).
- Bedrock Claude for the LLM (in-region, in-account).
from agentmatic.cluster import ClusterConfig
from agentmatic.prebuilt import create_rag_agent
from agentmatic import Bedrock
config = ClusterConfig(
topology="coordinator-worker",
transport="grpc",
workers=[f"worker-{i}.internal:9090" for i in range(12)],
load_balancing="least-loaded",
)
rag = create_rag_agent(
llm=Bedrock("anthropic.claude-3-5-sonnet"),
vectorstore=Qdrant(url="qdrant.internal:6333"),
top_k=12,
rerank=True,
cite=True,
).with_cluster(config)
What they hit
- 6,800 queries/day across 14 sites and 6 business units.
- Median latency 1.8 s; p95 4.2 s; p99 8.1 s.
- Zero data-residency incidents (audited by their internal security team).
- Annual infra cost: $76k (vs ~$400k SaaS quote).
What they liked
- Self-hosted from day one. No vendor risk, no compliance review of a third-party processor.
- Distributed primitives in the open-source core. No paid tier for the cluster runtime.
- OTel-first. Tracing wired into their existing Datadog pipeline; no new observability stack.
- Multi-tenant. Per-BU access control implemented via a custom node, not a vendor feature.
Their take
“We’ve evaluated four other agent platforms in the last year. Agentmatic is the only one where the answer to ‘can we run this entirely in our VPC with no telemetry leaving’ was ‘yes, and here’s the OTel config.’ Everything else wanted a control-plane dependency.”
— Director of AI platform
This story is anonymized at the customer's request. Industry, scale, and workflow details are accurate; identifying details have been changed.