Almost every production AI agent is some combination of three patterns: ReAct (think-then-act), Supervisor (route-to-specialist), and RAG (retrieve-then-answer). This is how to choose.

Quick guide: One question + a few tools → ReAct. Multiple distinct domains → Supervisor over specialists. Q&A over your private docs → RAG. The patterns compose: a Supervisor often routes to ReAct or RAG specialists.

ReAct: reason + act

The simplest useful agent loop. The LLM thinks, calls a tool, observes the result, decides to keep going or stop.

from agentmatic.prebuilt import create_react_agent
from agentmatic import OpenAI, tool

@tool
def search(query: str) -> str: ...

@tool
def calculator(expression: str) -> str: ...

agent = create_react_agent(
    llm=OpenAI("gpt-4o"),
    tools=[search, calculator],
    max_iterations=12,
)

Use when. Single-purpose task. Few tools (1–6). Linear flow (“look this up, then compute on it, then format”).

Skip when. Multiple distinct domains (use Supervisor). Pure retrieval (use RAG). Parallel fan-out (use Map).

Tuning. max_iterations prevents runaway loops. System prompt should describe when to stop. Tools should have clear, distinct names.

Supervisor: route to specialists

A supervisor reads the user message, picks a specialist sub-agent, forwards the call. The supervisor doesn’t do the work — it routes.

from agentmatic.prebuilt import create_supervisor, create_react_agent, create_rag_agent

docs = create_rag_agent(llm=OpenAI(), vectorstore=Qdrant.from_env())
billing = create_react_agent(llm=OpenAI(), tools=[get_invoice, refund])
escalation = create_react_agent(llm=OpenAI(), tools=[send_to_human])

supervisor = create_supervisor(
    llm=OpenAI("gpt-4o"),
    agents={"docs": docs, "billing": billing, "escalation": escalation},
    system_prompt="""
      Route to docs for product questions.
      Route to billing for invoice/refund questions.
      Route to escalation for anything off-policy.
    """,
)

Use when. Multiple distinct domains. Each domain has its own tools, prompt, persona. Customer-facing.

Skip when. Single domain (you’re over-engineering). All decisions depend on shared state (sub-agents can’t share state cleanly — that’s a graph problem).

Tuning. Each specialist should be testable in isolation. Routing prompts should be specific (“route to billing for invoice questions” not “route appropriately”). Bound recursion depth.

RAG: retrieve then answer

Embed the user question, look up relevant chunks in a vector store, stuff them into the prompt, generate a grounded answer.

from agentmatic.prebuilt import create_rag_agent
from agentmatic.vectorstore import Qdrant

rag = create_rag_agent(
    llm=OpenAI("gpt-4o"),
    vectorstore=Qdrant.from_env(),
    top_k=8,
    rerank=True,
    cite=True,
)

Use when. Q&A over a corpus you own. Documentation chatbots. Internal knowledge search. Compliance Q&A.

Skip when. Multi-hop reasoning across documents (vanilla RAG can’t). Synthesis tasks (the model can’t synthesize across more than ~10 chunks reliably). Real-time data (your vector store is stale).

Tuning. Top-K (default 8). Rerank — enable it (20–40% precision win). Citations — enable them so the user sees the source. Chunk size and overlap matter; tune to your corpus.

How they fail

PatternCommon failure
ReActStuck in a loop calling the same tool with slight variations
ReActPicks the wrong tool when given 15+ choices
SupervisorRoutes to the wrong specialist on ambiguous queries
SupervisorSpecialists don’t share context (user mentioned billing then asked a docs question)
RAGRetrieves the wrong chunks (top-K miss)
RAGGenerates plausible but unsupported claims when no good chunks found

How they compose

Supervisor → ReAct → MCP tools. The most common production shape. Supervisor routes; each specialist is a ReAct agent against a domain toolset; tools come from MCP servers.

Supervisor → RAG / ReAct mix. Some specialists are RAG (docs Q&A); some are ReAct (account actions). The supervisor decides which kind handles the turn.

ReAct → RAG-as-tool. The simplest composition: expose your RAG agent as a single tool inside a ReAct loop. The ReAct loop decides when to retrieve vs when to reason / compute.

@tool
def search_docs(query: str) -> str:
    """Retrieve relevant docs for a question."""
    return rag.invoke(query).text

reasoner = create_react_agent(llm=OpenAI(), tools=[search_docs, calculator])

This pattern is robust — the ReAct loop can re-query when retrieval misses, and the user sees a single coherent flow.

A decision tree

  1. Is your “agent” doing pure Q&A over a corpus? → RAG.
  2. Multiple distinct domains? → Supervisor over specialists.
  3. Single domain with a few tools? → ReAct.
  4. N parallel tasks? → Map (custom StateGraph with fan-out).
  5. All of the above? → Supervisor routing to RAG / ReAct / Map specialists.