Claude Opus 4.6 Release: Industry-Leading Coding Agent Capabilities
Anthropic releases Claude Opus 4.6, achieving top scores on Terminal-Bench 2.0 with 1M token context window, Agent Teams, Context Compaction, and enhanced safety measures.
Anthropic released Claude Opus 4.6 on February 5, delivering industry-leading performance in coding agent capabilities and introducing multiple developer-focused features.
Top Score on Terminal-Bench 2.0
Claude Opus 4.6 achieved the highest score on Terminal-Bench 2.0, an agentic coding evaluation benchmark. This benchmark measures an AI model’s ability to autonomously complete tasks within codebases, representing a critical indicator of real-world development utility.
The model also excels across other major benchmarks:
- Humanity’s Last Exam: Leads all frontier models on this complex multidisciplinary reasoning test
- GDPval-AA: Outperforms OpenAI GPT-5.2 by approximately 144 Elo points on economically valuable knowledge work tasks (finance, legal, etc.)
- BrowseComp: Best-in-class performance on locating hard-to-find information online
- SWE-bench Verified: Achieved 81.42% with prompt modification across 25 trials
1M Token Context Window
Claude Opus 4.6 is the first Opus-class model to offer a 1 million token context window in beta. This enables the model to maintain more information while working with large codebases or conducting extended conversations.
On the 8-needle 1M variant of MRCR v2—a needle-in-a-haystack benchmark testing retrieval of information “hidden” in vast amounts of text—Opus 4.6 scored 76%, while Sonnet 4.5 scored only 18.5%. This represents a qualitative shift in addressing “context rot,” the performance degradation that typically occurs as conversations exceed certain token counts.
New Developer Features
The Claude Developer Platform introduces several new capabilities:
Adaptive Thinking
Previously, developers faced a binary choice of enabling or disabling extended thinking. Adaptive Thinking allows Claude to automatically use extended thinking when helpful. Developers can adjust across four effort levels (low, medium, high, max).
Context Compaction (Beta)
To address context window limits during long-running conversations and agentic tasks, Context Compaction automatically summarizes and replaces older context. This enables Claude to perform longer tasks without hitting limits.
128k Output Tokens
Support for outputs up to 128,000 tokens allows Claude to complete larger-output tasks without breaking them into multiple requests.
US-only Inference
For workloads requiring execution within the United States, US-only Inference is available at 1.1× token pricing.
Agent Teams in Claude Code
Claude Code introduces Agent Teams as a research preview. Developers can now spin up multiple agents that work in parallel and coordinate autonomously—ideal for tasks that split into independent, read-heavy work like codebase reviews.
Users can take over any subagent directly using Shift+Up/Down or tmux.
Enhanced Office Tool Integration
Claude in Excel now handles long-running and harder tasks with improved performance, including planning before acting, ingesting unstructured data and inferring the right structure without guidance, and handling multi-step changes in one pass.
Claude in PowerPoint launches in research preview, reading layouts, fonts, and slide masters to stay on brand. It supports building from templates or generating full decks from descriptions. Available for Max, Team, and Enterprise plans.
Continued Safety Focus
Intelligence gains do not come at the cost of safety. On automated behavioral audits, Opus 4.6 showed low rates of misaligned behaviors such as deception, sycophancy, encouragement of user delusions, and cooperation with misuse. Overall alignment matches Opus 4.5, previously Anthropic’s most-aligned frontier model.
Opus 4.6 also shows the lowest rate of over-refusals—failing to answer benign queries—of any recent Claude model.
Anthropic conducted its most comprehensive safety evaluation set for any model, including new evaluations for user wellbeing, more complex tests of refusal capabilities for potentially dangerous requests, and updated evaluations of surreptitious harmful actions. Given enhanced cybersecurity capabilities, six new cybersecurity probes were developed to detect different forms of potential misuse.
Pricing
Claude Opus 4.6 pricing remains at $5/$25 per million tokens (input/output). Premium pricing applies for prompts exceeding 200k tokens ($10/$37.50 per million input/output tokens), available only on the Claude Developer Platform.
The model is available today on claude.ai, the Claude API, and major cloud platforms. Developers can use claude-opus-4-6 via the API.
For detailed evaluation results and safety assessments, refer to the official System Card.
Related Articles
Claude Code's 28 Official Plugins Revealed - Undocumented Feature Extensions
Reddit user discovers 28 official Claude Code plugins, most undocumented. Includes TypeScript LSP, security scanning, context7 documentation search, and Playwright automation.
Kimi Integrates OpenClaw Natively - 5,000+ Community Skills and 40GB Cloud Storage in Browser
Moonshot AI's Kimi.com now supports OpenClaw natively in browser tabs, offering 24/7 uptime, ClawHub access with 5,000+ skills, 40GB cloud storage, and pro-grade search capabilities.
OpenClaw v2026.2.14 Released: Major Security Hardening and 100+ Bug Fixes
OpenClaw releases v2026.2.14 with extensive security improvements, TUI stability enhancements, memory system optimizations, and 100+ bug fixes across channels, agents, and tools.
Popular Articles
868 Agentic Skills, One Command: Antigravity Awesome Skills Becomes the Cross-Tool Skill Standard
Antigravity Awesome Skills (v5.4.0) delivers 868+ battle-tested skills for Claude Code, Gemini CLI, Codex CLI, Cursor, GitHub Copilot, and five other AI coding assistants via a single npx command. With official skills from Anthropic, Vercel, OpenAI, Supabase, and Microsoft consolidated under one MIT-licensed repository, it's emerging as the portable skill layer for the fragmented AI coding agent landscape.
How Claude Sonnet 4.6 Agent Teams Achieve 4x Productivity: Practical Insights from Anthropic's Own Research
Two Anthropic studies—a survey of 132 internal engineers and an analysis of 1M+ real-world agent interactions—reveal the precise delegation strategies and autonomy patterns that enable high-performing teams to multiply output with Claude Sonnet 4.6 agent teams.
What Actually Makes OpenClaw Special: The Full Story from VibeTunnel to 200k+ GitHub Stars
The three-stage VibeTunnel→Clawdbot→OpenClaw evolution, Pi runtime philosophy, why HEARTBEAT is the real differentiator from Claude Code, and the ClawHub supply chain attack (12% of skills were malicious). An unvarnished look at the most used and most misunderstood OSS agent.
Latest Articles
Two AI Agent Communication Projects Hit Hacker News Simultaneously, Targeting MCP's Blind Spots
Aqua and Agent Semantic Protocol appeared on Hacker News on the same day, both tackling the same unsolved problem: how AI agents communicate directly without a central broker, across network boundaries, and asynchronously.
Claude Sonnet 4.6 Becomes the Default for Free and Pro Users — Outperforms Opus 4.5 on Coding Agent Benchmarks
Anthropic has made Claude Sonnet 4.6 the default model for claude.ai's Free and Pro plans. Released February 17, 2026, it matches Sonnet 4.5 pricing at $3/$15 per million tokens while internal Claude Code evaluations show it beating the previous frontier model, Opus 4.5, 59% of the time on agentic coding tasks.
Google Permanently Bans AI Pro Users for Accessing Gemini via OpenClaw, Continues Charging $250/Month
A Hacker News post garnering 140 points and 107 comments details how Google terminated Google AI Pro and Ultra accounts without warning after users accessed Gemini through OpenClaw, a third-party client. The incident surfaces deeper issues around prompt caching, subscription economics, and how AI providers enforce terms of service.