AGENTS.md Files Reduce Coding Agent Performance, arXiv Study Reveals
SWE-bench experiments show that repository context files like AGENTS.md decrease task success rates and increase inference costs by over 20%, challenging widely recommended practices by agent developers.
A groundbreaking arXiv paper published on February 12, 2026, titled “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” reveals that AGENTS.md and similar repository context files—widely recommended by coding agent developers—may actually harm agent performance.
Research Background
Many coding agent developers (Claude Code, Cursor, GitHub Copilot, etc.) strongly encourage placing AGENTS.md files in repositories. These files are intended to provide context information to help agents understand repositories and execute appropriate coding tasks.
However, no rigorous investigation has previously examined whether this practice is actually effective for real-world tasks.
Key Findings
Researchers evaluated AGENTS.md effectiveness across multiple coding agents and LLMs in two complementary settings:
- SWE-bench Tasks: Established tasks from popular repositories with LLM-generated context files following agent-developer recommendations
- Real Repositories: A novel collection of issues from repositories containing developer-committed context files
1. Reduced Task Success Rates
Context files like AGENTS.md tend to reduce task success rates compared to providing no repository context at all. This finding directly contradicts the conventional assumption that context files are beneficial.
2. Significant Inference Cost Increase
Using context files increases inference costs by over 20%, driven by increased token usage.
3. Behavioral Analysis
The presence of context files changes agent behavior:
- LLM-generated context files: Encourage more thorough testing and file traversal
- Developer-provided context files: Similarly promote broader exploration
- Instruction compliance: Agents strongly tend to respect context file instructions
4. Root Cause Identification
The research team concluded:
Unnecessary requirements from context files make tasks harder.
Practical Recommendations
Based on the findings, the paper offers clear guidance:
Human-written context files should describe only minimal requirements.
Specifically:
-
Avoid:
- Overly detailed guidelines
- Comprehensive instructions attempting to cover all possible scenarios
- General development principles not directly related to the task
-
Include:
- Minimal information directly necessary for the task
- Repository-specific critical constraints
- Important context that agents cannot discover independently
Industry Impact
These findings have significant implications for development teams rapidly adopting coding agents.
Impact on Agent Developers
Teams behind Claude Code, Cursor, GitHub Copilot, and similar tools may need to revise their AGENTS.md recommendations. The conventional wisdom that “more detail is better” should be reconsidered.
Impact on Developers
Developers who have already implemented AGENTS.md should consider:
- Reviewing existing AGENTS.md files and removing unnecessary requirements
- Focusing only on truly essential minimal information
- Measuring performance metrics before and after changes
Cost Reduction Potential
The potential to reduce inference costs by over 20% represents significant economic impact for organizations using agents at scale. With API pricing rising, this optimization cannot be ignored.
Research Limitations and Future Directions
The paper identifies several areas for future research:
- Validation on larger-scale repositories
- Effect measurement across different task types
- Determining optimal context file structure
- Comparing optimal context provision methods per agent
Conclusion
Well-intentioned AGENTS.md files may actually reduce agent performance and increase costs. This research demonstrates the importance of empirically validating practices considered “best practices.”
Coding agent users can potentially achieve both performance improvements and cost reductions by revisiting their AGENTS.md files in light of these findings.
Paper Link: arXiv:2602.11988
Authors: Thibaud Gloaguen et al.
Publication Date: February 12, 2026
Evaluation Environment: SWE-bench, multiple coding agents and LLMs
Related Articles
Claude Code's 28 Official Plugins Revealed - Undocumented Feature Extensions
Reddit user discovers 28 official Claude Code plugins, most undocumented. Includes TypeScript LSP, security scanning, context7 documentation search, and Playwright automation.
Kimi Integrates OpenClaw Natively - 5,000+ Community Skills and 40GB Cloud Storage in Browser
Moonshot AI's Kimi.com now supports OpenClaw natively in browser tabs, offering 24/7 uptime, ClawHub access with 5,000+ skills, 40GB cloud storage, and pro-grade search capabilities.
OpenClaw v2026.2.14 Released: Major Security Hardening and 100+ Bug Fixes
OpenClaw releases v2026.2.14 with extensive security improvements, TUI stability enhancements, memory system optimizations, and 100+ bug fixes across channels, agents, and tools.
Popular Articles
868 Agentic Skills, One Command: Antigravity Awesome Skills Becomes the Cross-Tool Skill Standard
Antigravity Awesome Skills (v5.4.0) delivers 868+ battle-tested skills for Claude Code, Gemini CLI, Codex CLI, Cursor, GitHub Copilot, and five other AI coding assistants via a single npx command. With official skills from Anthropic, Vercel, OpenAI, Supabase, and Microsoft consolidated under one MIT-licensed repository, it's emerging as the portable skill layer for the fragmented AI coding agent landscape.
How Claude Sonnet 4.6 Agent Teams Achieve 4x Productivity: Practical Insights from Anthropic's Own Research
Two Anthropic studies—a survey of 132 internal engineers and an analysis of 1M+ real-world agent interactions—reveal the precise delegation strategies and autonomy patterns that enable high-performing teams to multiply output with Claude Sonnet 4.6 agent teams.
What Actually Makes OpenClaw Special: The Full Story from VibeTunnel to 200k+ GitHub Stars
The three-stage VibeTunnel→Clawdbot→OpenClaw evolution, Pi runtime philosophy, why HEARTBEAT is the real differentiator from Claude Code, and the ClawHub supply chain attack (12% of skills were malicious). An unvarnished look at the most used and most misunderstood OSS agent.
Latest Articles
Two AI Agent Communication Projects Hit Hacker News Simultaneously, Targeting MCP's Blind Spots
Aqua and Agent Semantic Protocol appeared on Hacker News on the same day, both tackling the same unsolved problem: how AI agents communicate directly without a central broker, across network boundaries, and asynchronously.
Claude Sonnet 4.6 Becomes the Default for Free and Pro Users — Outperforms Opus 4.5 on Coding Agent Benchmarks
Anthropic has made Claude Sonnet 4.6 the default model for claude.ai's Free and Pro plans. Released February 17, 2026, it matches Sonnet 4.5 pricing at $3/$15 per million tokens while internal Claude Code evaluations show it beating the previous frontier model, Opus 4.5, 59% of the time on agentic coding tasks.
Google Permanently Bans AI Pro Users for Accessing Gemini via OpenClaw, Continues Charging $250/Month
A Hacker News post garnering 140 points and 107 comments details how Google terminated Google AI Pro and Ultra accounts without warning after users accessed Gemini through OpenClaw, a third-party client. The incident surfaces deeper issues around prompt caching, subscription economics, and how AI providers enforce terms of service.