The Journey to Find the Perfect AI Code Review Solution

The Client's Challenge: Making AI Understand 2,000 Lines of Coding Rules

Picture this: A client approaches you with nearly 2,000 lines of meticulously crafted coding standards—seven files containing years of accumulated best practices, style guidelines, and team wisdom.

Their request? Simple yet daunting:

"We need an AI to review our Pull Requests using these exact rules."

The mission broke down into three clear objectives:

Test PR-Agent: Evaluate if this popular open-source tool could handle our extensive rulebook
Optimize or Pivot: Either improve what exists or find something better
Deploy with Confidence: Integrate a reliable AI code review solution into the existing workflow

Sounds straightforward, right? Well, buckle up—this journey had more twists than I expected.

What is PR-Agent? The Promising Starting Point

PR-Agent is an open-source AI assistant developed by CodiumAI, designed to automate and enhance the Pull Request review process. Think of it as your AI teammate who:

Automatically analyzes code changes in PRs
Generates detailed PR descriptions
Suggests improvements for code quality
Enforces programming standards and best practices

The goal? Save development teams time, improve code quality, and accelerate the software development cycle.

For teams drowning in code reviews, PR-Agent seemed like the perfect life raft. But would it survive our 2,000-line stress test?

The First Experiment: When Numbers Don't Lie

To give PR-Agent a fair shot, I needed data, not gut feelings. I created two distinct test scenarios:

Test Scenario 1: The Complex PR

30 deliberate errors spread across multiple files—a real-world nightmare scenario.

PR-Agent's performance across different configurations

Test Scenario 2: The Simple PR

6 carefully placed errors—a typical everyday code review.

For each scenario, I ran three configurations:

No Custom Rules: Baseline PR-Agent with default settings
Full Rulebook: All 2,000 lines of the client's original rules
Optimized Rules: A streamlined, bullet-point version I created to be more "AI-digestible"

I was hopeful. I'd carefully formatted everything to make it easy for the AI to consume. The results, however, were... sobering.

When PR-Agent Got Confused: The Results

Complex PR (30 Errors) - The Reality Check

With Original Rules:

Found: 6 out of 30 errors (mostly basic issues)
Worse: Completely forgot bilingual requirements, responded only in English

With Optimized Rules:

Found: 10 out of 30 errors
Plus: Maintained bilingual capability (English-Japanese)

Without Rules:

Found: 5 errors
Plus: Bilingual support intact

The insight? Our massive rulebook wasn't helping—it was actually creating noise and confusing the AI.

Simple PR (6 Errors) - Mixed Results

With Original Rules:

Found: 5 out of 6 errors
BUT: No fix suggestions provided
Inconsistent: Bilingual capability worked intermittently

With Optimized Rules (The Winner):

Found: 6 out of 6 errors ✓
Provided: 6 out of 6 fix suggestions ✓
Maintained: Perfect bilingual operation ✓

The Hard Truth About PR-Agent

PR-Agent isn't a bad tool—it's just not designed to enforce complex rulesets.

Think of it this way: PR-Agent is like a helpful colleague who offers suggestions based on brief guidelines, not a strict reviewer enforcing a 2,000-line legal code.

With our extensive rulebook, PR-Agent was simply overwhelmed.

The Breakthrough: A Radical New Approach

Hitting a wall can be frustrating, but it's also when the best ideas emerge. I asked myself:

"If we can't teach an existing tool, why not build our own custom solution?"

That's when I proposed something completely different to the client: Combining GitHub Copilot in the IDE with a locally-running GitHub MCP Server.

Why This Architecture Made Sense

✅ Powerful Integration
The Model Context Protocol (MCP) server becomes a direct bridge between Copilot and GitHub. Copilot doesn't just read diffs—it accesses the entire repository with every analysis tool a developer has.

✅ Natural Workflow
The AI operates like a real human reviewer: reads changes, analyzes files individually, and leaves detailed comments on specific code lines.

The Trade-off? We'd lose the convenience of CI/CD automation. This would be a manual tool, not an automatic machine. But the client understood: sometimes precision matters more than automation.

They gave me the green light to build a prototype.

First Attempt: When GitHub Copilot Was a Rough Diamond

Having GitHub Copilot felt like holding a raw gemstone. I was so confident in its potential that I forgot the old saying: "Unpolished gems don't shine."

My initial approach was very "human": I built one long, detailed prompt explaining every step and expected Copilot to figure everything out on its own.

I failed... again.

The results were worse than PR-Agent:

With 30 errors: Copilot found only 7-8

Why Did It Fail?

"Lazy" AI
It only read the first few lines of the rules files, then skipped the rest.

"Forgetful" AI
The massive context caused it to frequently "lose track" mid-analysis, jumping straight to conclusions.

"Overly Creative" AI
Each run used different tools and approaches, making results unpredictable.

The "Aha!" Moment: Treating AI Agents Like Functions

The critical lesson: You can't treat a powerful AI Agent like a simple chatbot.

Handing it a lengthy "essay" of instructions and hoping it follows along is a recipe for disappointment.

Then came my breakthrough. I realized that beneath its natural language interface, Copilot has specific "functions" (tools) I could call with precise parameters, just like programming!

Some key tools I discovered:

read_file({ filePath, startLine, endLineNumber }) - Read specific portions of a file
file_search({ query, maxResults }) - Find files by glob pattern
grep_search({ query, includePattern, isRegexp, maxResults }) - Search text in files
semantic_search({ query }) - Intelligent semantic search in workspace

Thanks to these tools, I didn't need to ask AI to "read files" vaguely. Instead, I could command directly: "Use the read_file tool with this specific path."

Pro Tip: You can ask GitHub Copilot or other AI coding agents to list their available tools with this simple command:
"List the current tools you can use (function name, arguments, purpose, usage)"

The Winning Strategy: "Divide and Conquer" with Two Specialized Agents

Discovering Copilot's tool capabilities completely changed the game. My mistake wasn't the tool—it was how I was using it.

My initial prompt in review_pr.prompt.md failed because it was too "human"—trying to explain everything in natural language and expecting AI to infer the rest.

The Failed Approach: Too Vague

Using the GitHub MCP server, perform a comprehensive review of pull request #[PR_NUMBER] 
in repository [OWNER/REPO] following this DYNAMIC RULES-FIRST process:

STEP 1: Get PR Information
STEP 2: DYNAMICALLY LOAD TEAM RULES (PRIMARY PRIORITY)  
STEP 3: APPLY LOADED RULES TO CODE CHANGES
STEP 4: APPLY STANDARD CODING PRACTICES (SECONDARY)
STEP 5: Categorize Issues by Team Standards
...

This approach was too ambiguous and ineffective.

I needed a new strategy: "Divide and Conquer." I split the workflow into two specialized Agents, each with a dedicated prompt functioning like clearly defined functions:

The Analyst (Analyst Agent)
The Reporter (Reporter Agent)

Agent #1: The Analyst - Pure Analysis Machine

The analyst.prompt.md was designed to be a pure analysis expert. It doesn't need to communicate with humans. Its job is to systematically execute a sequence of tool calls to:

"Extract" the 2,000 lines of rules
Analyze code changes
Generate a structured list of violations (JSON format)

It doesn't "read" and "understand" like humans—it executes.

Example from the Analyst Prompt:

markdown

#### 1.1 AUTO-DETECT RULES STRUCTURE
"""
# TOOL: file_search
# Discover rules directory (project-agnostic)
file_search(query: "**/*rules*/**")
→ Set RULES_DIRECTORY = first_found_directory
"""

#### 1.2 COMPREHENSIVE RULES FILE DISCOVERY  
"""
# TOOL: file_search
# Find all rules files with various formats
file_search(query: "{RULES_DIRECTORY}/*.md")
→ Set RULES_FILES = all_discovered_files
"""

# TOOL: read_file
# Read each rules file completely
"""
FOR each file in RULES_FILES:
  read_file(filePath: file, startLineNumber: 1, endLineNumber: 9999)
"""

This prompt commands the AI precisely: what to do, which tool to use, with which parameters.

Agent #2: The Reporter - Professional Presentation

After the Analyst completes its work and returns a structured violation list, the second agent—report.prompt.md—takes over.

Its mission is simple: Take structured data and present it professionally.

It uses a strict template to create bilingual (English-Japanese) comments and post them precisely on the offending code lines in the Pull Request.

Example Template from the Reporter Prompt:

markdown

🚨 CODING RULE VIOLATION: [RULE_TITLE] - Rule violation / ルール違反

**Rule Source / ルールソース:** .rules/[FILENAME].md

**Rule Description / ルール説明:**  
EN: [ENGLISH_RULE_EXPLANATION]  
JP: [JAPANESE_RULE_TEXT_FROM_FILE]

**Violating Code / 違反コード:**
```[language]
[actual code block that violates the rule]
```

**Code-Compliant Fix / ルール準拠修正:**
```[language]
[suggested fix for the specific code block based on rule requirements]
```

This template ensures every comment is consistent, clear, and contains all necessary information.

the two-agent system (Analyst + Reporter) workflow

The Beautiful Results

With the Simple PR (6 Errors):

Detected: 6/6 errors ✓
Referenced: Specific rule violations ✓
Commented: Directly on offending lines ✓
Provided: Detailed fix suggestions ✓

With the Complex PR (30 Errors):

Detected: 22/30 errors ✓

A quantum leap compared to all previous attempts!

I know 22/30 could still be improved with more optimization time, but this proved a crucial point: Structured, "programming-style" AI prompting delivers superior results.

improvement progression from PR-Agent to Two-Agent Copilot

The Pragmatic Ending: Best of Both Worlds

After reviewing everything, the client made what I consider a very wise and practical decision:

Continue Deploying PR-Agent:
Keep it integrated in CI/CD for basic, automated reviews.

Add the GitHub Copilot + GitHub MCP Solution:
Use this as a specialized tool for complex PRs requiring precision and strict rule compliance.

This hybrid approach maximizes both automation convenience and review accuracy when it matters most.

the final hybrid approach (PR-Agent + Copilot/MCP)

The Biggest Lesson: From "Commanding" to "Programming" AI

This journey taught me an invaluable lesson. The era of AI isn't just about writing natural language commands.

We're not just hoping for "magic" anymore—we're actively designing and controlling the process to achieve desired outcomes.

This isn't a step backward; it's a different approach to AI interaction. To truly harness their power, we developers must learn to "program" them.

We need to shift our mindset from instructing an assistant to architecting an intelligent machine.

This Means:

Break Down Problems
Instead of one big request, divide into small, clear tasks.

Specify Tools
Command the AI precisely which tool to use for each task.

Orchestrate Workflows
Build a structured workflow where the output of one step becomes the input of the next.

Key Takeaways for Building Better AI Agents

Visual summary of the 5 main lessons learned

If you're working with AI agents for code review or any agentic AI development, here are the golden rules I learned:

1. Don't Overload Context

Massive rule files create noise. Simplify and structure information.

2. Use Tool-Based Commands

Direct tool invocations are more reliable than vague natural language requests.

3. Divide Complex Tasks

Split workflows into specialized agents with clear responsibilities.

4. Iterate and Refine

AI prompts aren't "set it and forget it"—they require continuous refinement based on real results.

5. Combine Automation with Precision

Sometimes the best solution is hybrid: automated tools for routine work, specialized AI for complex cases.

The Future of AI Code Review

As AI-powered development tools continue to evolve, understanding how to properly architect AI workflows becomes increasingly critical.

GitHub Copilot's code review capabilities are expanding rapidly, with new features being added regularly. According to GitHub's official documentation, Copilot code review now includes:

Premium request units for deeper analysis
Custom instructions for team-specific standards
Multi-language support across dozens of programming languages
Integration with IDE and web-based workflows

Meanwhile, the Model Context Protocol (MCP) is revolutionizing how AI agents interact with external systems. As Anthropic explains, MCP provides a universal, open standard that:

Eliminates fragmented integrations
Enables context-aware AI applications
Supports both local and remote server deployments
Works across multiple AI platforms

Practical Resources

For GitHub Copilot Code Review:

For Model Context Protocol:

Final Thoughts

Building effective AI code review systems isn't about finding the "perfect tool"—it's about understanding how to architect AI workflows that match your specific needs.

Sometimes that means using popular off-the-shelf solutions like PR-Agent. Other times it means building custom, specialized agents with tools like GitHub Copilot and MCP.

The key is understanding when to automate, when to specialize, and how to make AI truly work for you rather than against you.

What's your experience with AI code review? Have you tried implementing custom agents, or are you sticking with established tools? I'd love to hear about your journey in the comments below.

Command Palette