Concrete tutorial. By the end you have a working AI code reviewer that runs against any PR. Around 200 lines of code total.
Pieces
- Filesystem MCP server — reads files from the checkout.
- Git MCP server — reads diffs, commit history, blame.
- ReAct agent — orchestrates the review.
- Custom tool: post_review_comment — sends structured comments back to GitHub.
Step 1 — Install
pip install agentmatic httpx
npm install -g @modelcontextprotocol/server-filesystem
uvx --install-only mcp-server-git # warm the cache
Step 2 — Wire up the MCP clients
from agentmatic.tools.mcp import MCPClient
async def get_repo_tools(repo_path: str):
fs = await MCPClient.stdio([
"npx", "-y", "@modelcontextprotocol/server-filesystem", repo_path
])
git = await MCPClient.stdio([
"uvx", "mcp-server-git", "--repository", repo_path
])
fs_tools = await fs.list_tools_as_agentmatic()
git_tools = await git.list_tools_as_agentmatic()
return fs_tools + git_tools
This gives you filesystem and git tools — read_file, list_directory, git_diff, git_log, git_blame, etc. — without writing a single line of file or git access code.
Step 3 — Build the comment tool
import httpx
from agentmatic import tool
from pydantic import BaseModel, Field
from typing import Literal
class ReviewComment(BaseModel):
path: str = Field(description="File path in the repo.")
line: int = Field(description="Line number (in the new file).")
severity: Literal["nit", "minor", "major", "blocker"] = Field(description="Severity.")
body: str = Field(description="Markdown comment body.")
@tool
async def post_review_comment(comment: ReviewComment, pull_number: int, owner: str, repo: str) -> str:
"""Post a single review comment to a GitHub PR. Use this to flag issues you find."""
async with httpx.AsyncClient() as client:
r = await client.post(
f"https://api.github.com/repos/{owner}/{repo}/pulls/{pull_number}/comments",
headers={"Authorization": f"Bearer {GITHUB_TOKEN}"},
json={
"path": comment.path,
"line": comment.line,
"body": f"**[{comment.severity}]** {comment.body}",
},
)
r.raise_for_status()
return f"Posted comment on {comment.path}:{comment.line}"
Step 4 — Assemble the agent
from agentmatic import Anthropic
from agentmatic.prebuilt import create_react_agent
async def make_reviewer(repo_path: str, owner: str, repo: str, pr_number: int):
repo_tools = await get_repo_tools(repo_path)
# Bind PR context into the comment tool so the LLM doesn't have to repeat it.
bound_comment = post_review_comment.bind(owner=owner, repo=repo, pull_number=pr_number)
reviewer = create_react_agent(
llm=Anthropic("claude-3-5-sonnet-20241022"),
tools=repo_tools + [bound_comment],
max_iterations=24,
system_prompt="""
You are a code reviewer. Your job:
1. Read the diff against main with git_diff.
2. For each non-trivial change, decide if it has a bug, style issue, or correctness risk.
3. Post review comments using post_review_comment. Use the right severity:
- blocker: bug that will break prod
- major: likely incorrect behavior
- minor: style / readability
- nit: pedantic
4. Stop after posting all relevant comments.
Be specific. Quote code. Suggest fixes when you can.
""",
)
return reviewer
Step 5 — The webhook handler
from fastapi import FastAPI, Request
app = FastAPI()
@app.post("/webhook")
async def github_webhook(req: Request):
payload = await req.json()
if payload["action"] != "opened":
return {"ok": True}
pr = payload["pull_request"]
repo_path = await clone_or_pull(pr["base"]["repo"]["clone_url"], pr["head"]["sha"])
reviewer = await make_reviewer(
repo_path=repo_path,
owner=pr["base"]["repo"]["owner"]["login"],
repo=pr["base"]["repo"]["name"],
pr_number=pr["number"],
)
result = await reviewer.ainvoke(
f"Review PR #{pr['number']}: {pr['title']}. Diff is against {pr['base']['ref']}."
)
return {"ok": True, "comments_posted": result.tool_calls_count}
Step 6 — Production hardening
from agentmatic.resilience import RetryPolicy
from agentmatic.checkpoint import PostgresSaver
reviewer = (Agent.builder("reviewer")
.llm(Anthropic("claude-3-5-sonnet-20241022"))
.tools(repo_tools + [bound_comment])
.checkpoint(PostgresSaver.from_env())
.circuit_breaker("anthropic", failure_threshold=5, cooldown_seconds=30)
.circuit_breaker("github", failure_threshold=3, cooldown_seconds=10)
.retry_policy(RetryPolicy.exponential(max_attempts=3, jitter=True))
.build())
Now the reviewer survives Anthropic incidents, GitHub rate-limits, and worker crashes. Failed reviews land in the DLQ for later replay.
Real numbers from a small team
A small dev-tools team built almost exactly this — see /customers/devtools-mcp-code-review. 14 days from kickoff to production. 78% of comments accepted by human reviewers. ~$0.04 average per review.
Why MCP saved them weeks
They didn’t write filesystem or git tools. The MCP servers from @modelcontextprotocol are already battle-tested by Claude Desktop and Cursor. Day-one functionality came from existing servers. They focused on comment quality (the only thing that mattered for shipping).
What’s next
- Recipe: ReAct pattern
- Recipe: MCP tool wrapping
- /examples/code-reviewer — the source on GitHub.