rag recipe

The RAG pattern: retrieval-augmented generation

A retrieval-augmented generation agent: embed the question, look up relevant chunks, synthesize a grounded answer with citations. The right default for Q&A over your docs.

5 min read · Published May 3, 2026 · Languages: python, typescript, rust

The pattern

Retrieval-Augmented Generation: don’t ask the LLM what it remembers — show it the docs and ask it to summarize. RAG is the cheapest, most controllable way to ground a model in your private corpus.

RAG in one line: Embed the question, pull the top-K relevant chunks from a vector store, stuff them into the prompt, generate a grounded answer with citations.

Prebuilt

from agentmatic.prebuilt import create_rag_agent
from agentmatic.vectorstore import Qdrant

vectorstore = Qdrant.from_env(collection="my-docs")
rag = create_rag_agent(
    llm=OpenAI("gpt-4o"),
    vectorstore=vectorstore,
    top_k=8,
    rerank=True,
    cite=True,
)

answer = rag.invoke("How do I configure circuit breakers per tool?")
print(answer.text)
print(answer.citations)  # [{ source, score, content_snippet }, ...]

Why our default works

Embeddings: OpenAI text-embedding-3-small by default; swap for any other via the EmbeddingProvider trait.
Rerank: enabled by default with Cohere rerank (or local model via Ollama) — improves precision @ K by 20–40%.
Citations: every chunk passed to the LLM is referenced in the output, so the user sees which docs grounded the answer.

When to add a graph

The prebuilt RAG agent is one node. For real pipelines you want:

Query rewriting — turn the user’s vague question into a precise vector query.
Multi-query — pull from multiple vector stores or indexes.
Self-RAG — let the model decide whether to retrieve at all.
Reflection — after a draft answer, check it against the sources.

graph = StateGraph(RagState)
graph.add_node("rewrite", rewrite_query)
graph.add_node("retrieve", retrieve_chunks)
graph.add_node("answer", generate_answer)
graph.add_node("reflect", self_check)
graph.add_conditional_edges("reflect", verify, {"ok": END, "retry": "retrieve"})

Vector stores

Built-in: in-memory, Qdrant. Via bridge: Pinecone, Chroma, pgvector, Weaviate.