Concrete tutorial. By the end you have a working AI code reviewer that runs against any PR. Around 200 lines of code total.

Pieces

  • Filesystem MCP server — reads files from the checkout.
  • Git MCP server — reads diffs, commit history, blame.
  • ReAct agent — orchestrates the review.
  • Custom tool: post_review_comment — sends structured comments back to GitHub.

Step 1 — Install

pip install agentmatic httpx
npm install -g @modelcontextprotocol/server-filesystem
uvx --install-only mcp-server-git  # warm the cache

Step 2 — Wire up the MCP clients

from agentmatic.tools.mcp import MCPClient

async def get_repo_tools(repo_path: str):
    fs = await MCPClient.stdio([
        "npx", "-y", "@modelcontextprotocol/server-filesystem", repo_path
    ])
    git = await MCPClient.stdio([
        "uvx", "mcp-server-git", "--repository", repo_path
    ])
    fs_tools = await fs.list_tools_as_agentmatic()
    git_tools = await git.list_tools_as_agentmatic()
    return fs_tools + git_tools

This gives you filesystem and git tools — read_file, list_directory, git_diff, git_log, git_blame, etc. — without writing a single line of file or git access code.

Step 3 — Build the comment tool

import httpx
from agentmatic import tool
from pydantic import BaseModel, Field
from typing import Literal

class ReviewComment(BaseModel):
    path: str = Field(description="File path in the repo.")
    line: int = Field(description="Line number (in the new file).")
    severity: Literal["nit", "minor", "major", "blocker"] = Field(description="Severity.")
    body: str = Field(description="Markdown comment body.")

@tool
async def post_review_comment(comment: ReviewComment, pull_number: int, owner: str, repo: str) -> str:
    """Post a single review comment to a GitHub PR. Use this to flag issues you find."""
    async with httpx.AsyncClient() as client:
        r = await client.post(
            f"https://api.github.com/repos/{owner}/{repo}/pulls/{pull_number}/comments",
            headers={"Authorization": f"Bearer {GITHUB_TOKEN}"},
            json={
                "path": comment.path,
                "line": comment.line,
                "body": f"**[{comment.severity}]** {comment.body}",
            },
        )
        r.raise_for_status()
        return f"Posted comment on {comment.path}:{comment.line}"

Step 4 — Assemble the agent

from agentmatic import Anthropic
from agentmatic.prebuilt import create_react_agent

async def make_reviewer(repo_path: str, owner: str, repo: str, pr_number: int):
    repo_tools = await get_repo_tools(repo_path)

    # Bind PR context into the comment tool so the LLM doesn't have to repeat it.
    bound_comment = post_review_comment.bind(owner=owner, repo=repo, pull_number=pr_number)

    reviewer = create_react_agent(
        llm=Anthropic("claude-3-5-sonnet-20241022"),
        tools=repo_tools + [bound_comment],
        max_iterations=24,
        system_prompt="""
You are a code reviewer. Your job:
1. Read the diff against main with git_diff.
2. For each non-trivial change, decide if it has a bug, style issue, or correctness risk.
3. Post review comments using post_review_comment. Use the right severity:
   - blocker: bug that will break prod
   - major: likely incorrect behavior
   - minor: style / readability
   - nit: pedantic
4. Stop after posting all relevant comments.
Be specific. Quote code. Suggest fixes when you can.
        """,
    )
    return reviewer

Step 5 — The webhook handler

from fastapi import FastAPI, Request

app = FastAPI()

@app.post("/webhook")
async def github_webhook(req: Request):
    payload = await req.json()
    if payload["action"] != "opened":
        return {"ok": True}

    pr = payload["pull_request"]
    repo_path = await clone_or_pull(pr["base"]["repo"]["clone_url"], pr["head"]["sha"])

    reviewer = await make_reviewer(
        repo_path=repo_path,
        owner=pr["base"]["repo"]["owner"]["login"],
        repo=pr["base"]["repo"]["name"],
        pr_number=pr["number"],
    )

    result = await reviewer.ainvoke(
        f"Review PR #{pr['number']}: {pr['title']}. Diff is against {pr['base']['ref']}."
    )
    return {"ok": True, "comments_posted": result.tool_calls_count}

Step 6 — Production hardening

from agentmatic.resilience import RetryPolicy
from agentmatic.checkpoint import PostgresSaver

reviewer = (Agent.builder("reviewer")
    .llm(Anthropic("claude-3-5-sonnet-20241022"))
    .tools(repo_tools + [bound_comment])
    .checkpoint(PostgresSaver.from_env())
    .circuit_breaker("anthropic", failure_threshold=5, cooldown_seconds=30)
    .circuit_breaker("github", failure_threshold=3, cooldown_seconds=10)
    .retry_policy(RetryPolicy.exponential(max_attempts=3, jitter=True))
    .build())

Now the reviewer survives Anthropic incidents, GitHub rate-limits, and worker crashes. Failed reviews land in the DLQ for later replay.

Real numbers from a small team

A small dev-tools team built almost exactly this — see /customers/devtools-mcp-code-review. 14 days from kickoff to production. 78% of comments accepted by human reviewers. ~$0.04 average per review.

Why MCP saved them weeks

They didn’t write filesystem or git tools. The MCP servers from @modelcontextprotocol are already battle-tested by Claude Desktop and Cursor. Day-one functionality came from existing servers. They focused on comment quality (the only thing that mattered for shipping).

What’s next