Almost every production AI agent is some combination of three patterns: ReAct (think-then-act), Supervisor (route-to-specialist), and RAG (retrieve-then-answer). This is how to choose.
Quick guide: One question + a few tools → ReAct. Multiple distinct domains → Supervisor over specialists. Q&A over your private docs → RAG. The patterns compose: a Supervisor often routes to ReAct or RAG specialists.
ReAct: reason + act
The simplest useful agent loop. The LLM thinks, calls a tool, observes the result, decides to keep going or stop.
from agentmatic.prebuilt import create_react_agent
from agentmatic import OpenAI, tool
@tool
def search(query: str) -> str: ...
@tool
def calculator(expression: str) -> str: ...
agent = create_react_agent(
llm=OpenAI("gpt-4o"),
tools=[search, calculator],
max_iterations=12,
)
Use when. Single-purpose task. Few tools (1–6). Linear flow (“look this up, then compute on it, then format”).
Skip when. Multiple distinct domains (use Supervisor). Pure retrieval (use RAG). Parallel fan-out (use Map).
Tuning. max_iterations prevents runaway loops. System prompt should describe when to stop. Tools should have clear, distinct names.
Supervisor: route to specialists
A supervisor reads the user message, picks a specialist sub-agent, forwards the call. The supervisor doesn’t do the work — it routes.
from agentmatic.prebuilt import create_supervisor, create_react_agent, create_rag_agent
docs = create_rag_agent(llm=OpenAI(), vectorstore=Qdrant.from_env())
billing = create_react_agent(llm=OpenAI(), tools=[get_invoice, refund])
escalation = create_react_agent(llm=OpenAI(), tools=[send_to_human])
supervisor = create_supervisor(
llm=OpenAI("gpt-4o"),
agents={"docs": docs, "billing": billing, "escalation": escalation},
system_prompt="""
Route to docs for product questions.
Route to billing for invoice/refund questions.
Route to escalation for anything off-policy.
""",
)
Use when. Multiple distinct domains. Each domain has its own tools, prompt, persona. Customer-facing.
Skip when. Single domain (you’re over-engineering). All decisions depend on shared state (sub-agents can’t share state cleanly — that’s a graph problem).
Tuning. Each specialist should be testable in isolation. Routing prompts should be specific (“route to billing for invoice questions” not “route appropriately”). Bound recursion depth.
RAG: retrieve then answer
Embed the user question, look up relevant chunks in a vector store, stuff them into the prompt, generate a grounded answer.
from agentmatic.prebuilt import create_rag_agent
from agentmatic.vectorstore import Qdrant
rag = create_rag_agent(
llm=OpenAI("gpt-4o"),
vectorstore=Qdrant.from_env(),
top_k=8,
rerank=True,
cite=True,
)
Use when. Q&A over a corpus you own. Documentation chatbots. Internal knowledge search. Compliance Q&A.
Skip when. Multi-hop reasoning across documents (vanilla RAG can’t). Synthesis tasks (the model can’t synthesize across more than ~10 chunks reliably). Real-time data (your vector store is stale).
Tuning. Top-K (default 8). Rerank — enable it (20–40% precision win). Citations — enable them so the user sees the source. Chunk size and overlap matter; tune to your corpus.
How they fail
| Pattern | Common failure |
|---|---|
| ReAct | Stuck in a loop calling the same tool with slight variations |
| ReAct | Picks the wrong tool when given 15+ choices |
| Supervisor | Routes to the wrong specialist on ambiguous queries |
| Supervisor | Specialists don’t share context (user mentioned billing then asked a docs question) |
| RAG | Retrieves the wrong chunks (top-K miss) |
| RAG | Generates plausible but unsupported claims when no good chunks found |
How they compose
Supervisor → ReAct → MCP tools. The most common production shape. Supervisor routes; each specialist is a ReAct agent against a domain toolset; tools come from MCP servers.
Supervisor → RAG / ReAct mix. Some specialists are RAG (docs Q&A); some are ReAct (account actions). The supervisor decides which kind handles the turn.
ReAct → RAG-as-tool. The simplest composition: expose your RAG agent as a single tool inside a ReAct loop. The ReAct loop decides when to retrieve vs when to reason / compute.
@tool
def search_docs(query: str) -> str:
"""Retrieve relevant docs for a question."""
return rag.invoke(query).text
reasoner = create_react_agent(llm=OpenAI(), tools=[search_docs, calculator])
This pattern is robust — the ReAct loop can re-query when retrieval misses, and the user sees a single coherent flow.
A decision tree
- Is your “agent” doing pure Q&A over a corpus? → RAG.
- Multiple distinct domains? → Supervisor over specialists.
- Single domain with a few tools? → ReAct.
- N parallel tasks? → Map (custom StateGraph with fan-out).
- All of the above? → Supervisor routing to RAG / ReAct / Map specialists.