Why Your AI Agent Gets Dumber with Large Specs (And How to Fix It)

November 10, 2025 · 8 min read

LeanSpec Author

Your spec fits in the context window. So why does your AI agent make mistakes, ignore instructions, and produce worse code?

You paste a detailed 2,000-line architecture document into Cursor. The context window can handle it—200K tokens, plenty of room. But something's off. The AI suggests an approach you explicitly ruled out on page 3. It asks questions you already answered. The code it generates contradicts the design decisions you documented.

The problem isn't context size. It's context quality.

The Real Problem: Performance Degradation

Modern AI models have massive context windows—Claude has 200K tokens, GPT has 128K, and newer models are pushing toward 1M+. But here's what the marketing doesn't tell you: AI performance degrades significantly as context grows, even when you're nowhere near the limit.

The research is clear:

Databricks found that long-context performance degrades significantly even when within theoretical limits. Smaller models degrade even earlier.

Berkeley's Function-Calling Leaderboard confirmed that ALL models perform worse when given more tools or options to choose from. More context = more confusion = lower accuracy.

Research shows (arXiv:2505.06120) significant performance drops when models need to process information across multiple context turns or with increased complexity.

Why This Happens

It comes down to fundamental constraints:

Attention dilution - Transformer attention has N² complexity. More tokens = harder to focus on what matters.
Context rot - With large context, models start ignoring their training and just repeat patterns from the context history. They become less intelligent, not more.
Option overload - Too many choices (tools, patterns, approaches) leads to wrong selections. This isn't unique to AI—it's a cognitive constraint.
Token economics - Every extra token costs money and time. A 2,000-line spec costs 6x more to process than a 300-line spec.

What This Means For You

When you're using AI coding assistants:

Cursor, Copilot, Claude start making basic mistakes they wouldn't make with smaller context
Code generation becomes less accurate and more likely to contradict your requirements
Responses slow down as the model processes more irrelevant information
Costs scale up linearly with context size
You spend more time fixing AI mistakes than you save from AI assistance

The irony: You write detailed specs to help the AI, but the detail makes the AI worse.

The Solution: Context Engineering

Context engineering is the practice of managing AI working memory to maximize effectiveness. It's not about squeezing into context limits—it's about maintaining AI performance at any scale.

Here are four strategies that actually work, backed by research and real-world usage:

1. Partitioning - Split and Load Selectively

What it is: Break content into focused chunks, load only what's needed for the current task.

Example:

# Instead of one 1,200-line spec:
specs/dashboard/README.md          (200 lines - overview)
specs/dashboard/DESIGN.md          (350 lines - architecture)
specs/dashboard/IMPLEMENTATION.md  (150 lines - plan)
specs/dashboard/TESTING.md         (180 lines - tests)

# AI loads only what it needs
# Working on architecture? Read DESIGN.md only
# Writing tests? Read TESTING.md only

The benefit: AI processes 200-350 lines instead of 1,200. Faster, more focused, fewer mistakes.

2. Compaction - Remove Redundancy

What it is: Eliminate duplicate or inferable content.

Before:

## Authentication
The authentication system uses JWT tokens. JWT tokens are 
industry-standard and provide stateless authentication. The 
benefit of JWT tokens is that they don't require server-side 
session storage...

## Implementation
We'll implement JWT authentication. JWT was chosen because...
[repeats same rationale]

After:

## Authentication
Uses JWT tokens (stateless, no session storage).

## Implementation
[links to Authentication section for rationale]

The benefit: Higher signal-to-noise ratio. AI focuses on unique information, not repetition.

3. Compression - Summarize What's Done

What it is: Condense completed work while preserving essential decisions.

Before:

## Phase 1: Infrastructure Setup
Set up project structure:
- Create src/ directory
- Create tests/ directory  
- Configure TypeScript with tsconfig.json
- Set up ESLint with .eslintrc
[50 lines of detailed steps...]

After (once completed):

## ✅ Phase 1: Infrastructure (Completed 2025-10-15)
Project structure established with TypeScript, testing, and CI.
See commit abc123 for details.

The benefit: Keep project history without bloat. AI knows what happened without drowning in details.

4. Isolation - Separate Unrelated Concerns

What it is: Move independent features into separate specs with clear relationships.

Before: One 1,200-line spec covering dashboard UI, metrics API, health scoring algorithm, and chart library evaluation.

After: Four focused specs, each under 400 lines:

dashboard-ui - User interface and interactions
metrics-api - Data endpoint design
health-scoring - Algorithm details
chart-evaluation - Library comparison (can be archived after decision)

The benefit: Independent evolution. When the algorithm changes, the UI spec stays untouched.

The Key Insight

Keep context dense (high signal), not just small.

It's not about arbitrary line limits. It's about removing anything that doesn't directly inform the current decision. Every word that doesn't help the AI make better choices is making it worse.

Real Results from Dogfooding

We built LeanSpec using LeanSpec itself—the ultimate test of whether this methodology actually works.

The velocity: 10 days from zero to production

Full-featured CLI with 15+ commands
MCP server for Cursor, GitHub Copilot integration
Documentation site with comprehensive guides
60+ specs written and implemented with AI agents

Then we violated our own principles: Some specs grew to 1,166 lines. We hit the exact problems we were solving:

AI agents started corrupting specs during edits
Code generation became less reliable
Responses slowed down noticeably
We spent more time fixing mistakes

We applied context engineering: Split large specs, removed redundancy, compressed historical sections.

Largest spec went from 1,166 lines → 378 lines (largest partition)
AI agents work reliably again
Faster iterations, accurate output
Can confidently say: "We practice what we preach"

Concrete Benefits You'll See

When you apply context engineering to your specs:

✅ Fewer AI mistakes - Focused context produces accurate, consistent output
✅ Faster iterations - Less processing time per AI request
✅ Lower costs - Fewer tokens = cheaper API calls (6x savings on 2,000→300 line reduction)
✅ Better understanding - AI actually follows your requirements instead of hallucinating
✅ Maintainable by humans - Specs you can read in 5-10 minutes stay in sync with code

Works With Your Tools

This isn't about a specific AI tool—it's about how all transformer-based models handle context:

Cursor - Reads markdown specs for context
GitHub Copilot - Uses workspace files for suggestions
Claude - Via MCP server integration
Aider - Processes project documentation
Windsurf - Analyzes codebase context

Any AI coding assistant benefits from well-engineered context.

Getting Started

LeanSpec gives you both the methodology and the tooling to apply context engineering to your specs.

The Methodology

Five principles guide decision-making:

Context Economy - Fit in working memory (human + AI)
Signal-to-Noise - Every word informs decisions
Progressive Disclosure - Add structure when needed
Intent Over Implementation - Capture why, not just how
Bridge the Gap - Both human and AI understand

These aren't arbitrary rules—they're derived from real constraints (transformer attention, cognitive limits, token costs).

The Tooling

CLI commands help you detect and fix context issues:

# Install
npm install -g lean-spec

# Initialize in your project
cd your-project
lean-spec init

# Detect issues
lean-spec validate              # Check for problems
lean-spec complexity <spec>     # Analyze size/structure

# Fix issues  
lean-spec split <spec>          # Guided splitting workflow

# Track progress
lean-spec board                 # Kanban view of all specs

Start Simple, Grow as Needed

Solo developer? Just use status and created fields. Keep specs focused.

Small team? Add tags and priority. Use the CLI for visibility.

Enterprise? Add custom fields (epic, sprint, assignee). Integrate with your workflow.

The structure adapts to your needs—you never add complexity "just in case."

Try It Today

npm install -g lean-spec
cd your-project  
lean-spec init
lean-spec create user-authentication

Your AI coding assistant will thank you.

The Bottom Line

Your AI tools are only as good as the context you give them.

A 2,000-line spec that fits in the context window will still produce worse results than a 300-line spec with the same essential information. It's not about limits—it's about performance.

Context engineering isn't optimization. It's fundamental to making AI-assisted development work reliably.

LeanSpec is a context engineering methodology for human-AI collaboration on software specs. It gives you:

Principles derived from real constraints
Patterns that scale from solo to enterprise
Tools that detect and prevent context problems
Proof from building the tool with the methodology

The choice: Keep writing large specs and fighting with unreliable AI output, or engineer your context for the tools you actually use.

Learn more:

References:

The Real Problem: Performance Degradation​

Why This Happens​

What This Means For You​

The Solution: Context Engineering​

1. Partitioning - Split and Load Selectively​

2. Compaction - Remove Redundancy​

3. Compression - Summarize What's Done​

4. Isolation - Separate Unrelated Concerns​

The Key Insight​

Real Results from Dogfooding​

Concrete Benefits You'll See​

Works With Your Tools​

Getting Started​

The Methodology​

The Tooling​

Start Simple, Grow as Needed​

Try It Today​

The Bottom Line​