rag recipe
The RAG pattern: retrieval-augmented generation
A retrieval-augmented generation agent: embed the question, look up relevant chunks, synthesize a grounded answer with citations. The right default for Q&A over your docs.
5 min read · Published May 3, 2026 · Languages: python, typescript, rust
The pattern
Retrieval-Augmented Generation: don’t ask the LLM what it remembers — show it the docs and ask it to summarize. RAG is the cheapest, most controllable way to ground a model in your private corpus.
RAG in one line: Embed the question, pull the top-K relevant chunks from a vector store, stuff them into the prompt, generate a grounded answer with citations.
Prebuilt
from agentmatic.prebuilt import create_rag_agent
from agentmatic.vectorstore import Qdrant
vectorstore = Qdrant.from_env(collection="my-docs")
rag = create_rag_agent(
llm=OpenAI("gpt-4o"),
vectorstore=vectorstore,
top_k=8,
rerank=True,
cite=True,
)
answer = rag.invoke("How do I configure circuit breakers per tool?")
print(answer.text)
print(answer.citations) # [{ source, score, content_snippet }, ...]
Why our default works
- Embeddings: OpenAI
text-embedding-3-smallby default; swap for any other via theEmbeddingProvidertrait. - Rerank: enabled by default with Cohere rerank (or local model via Ollama) — improves precision @ K by 20–40%.
- Citations: every chunk passed to the LLM is referenced in the output, so the user sees which docs grounded the answer.
When to add a graph
The prebuilt RAG agent is one node. For real pipelines you want:
- Query rewriting — turn the user’s vague question into a precise vector query.
- Multi-query — pull from multiple vector stores or indexes.
- Self-RAG — let the model decide whether to retrieve at all.
- Reflection — after a draft answer, check it against the sources.
graph = StateGraph(RagState)
graph.add_node("rewrite", rewrite_query)
graph.add_node("retrieve", retrieve_chunks)
graph.add_node("answer", generate_answer)
graph.add_node("reflect", self_check)
graph.add_conditional_edges("reflect", verify, {"ok": END, "retry": "retrieve"})
Vector stores
Built-in: in-memory, Qdrant. Via bridge: Pinecone, Chroma, pgvector, Weaviate.