Anthropic shipped Claude Opus 4.7 on April 16. We had it in our workflow the same week and have been running it across planning sessions, code review, and agentic tasks since. What follows is what we actually found, separate from the benchmark marketing.

What Is Actually Better

The headline numbers are real. Opus 4.7 scores 64.3% on SWE-bench Pro, the standardised benchmark for software engineering tasks that involves fixing real bugs in real open-source repositories. On Anthropic's internal 93-task coding suite, it improves 13% over Opus 4.6. That is a meaningful gap, not a rounding difference.

Beyond those numbers, two changes matter in practice more than any benchmark.

Self-verification. Opus 4.7 checks its own work before returning an answer. In agentic workflows where one model output feeds into the next step, this reduces error propagation. You get more reliable chains because the model catches problems earlier rather than passing them forward. We noticed this in automated content workflows where 4.6 would occasionally produce outputs that needed manual correction.

Computer use. If you are using Claude to control a browser or desktop environment, 4.7 handles complex pages more reliably than 4.6. We use computer use for automated data extraction across client reporting, and the success rate on pages with dynamic content improved noticeably after switching.

Scaled tool use also improved, which matters if you are building anything that chains multiple API calls or tool interactions in sequence.

Where We Use It Now

For complex, multi-step tasks in Claude Code, Opus 4.7 is the right call. Architecture planning where you need the model to hold a lot of context and make good judgment calls. Code review across multiple files touching different modules. Automated workflows where self-verification reduces manual checking downstream.

We build and integrate custom AI tools for businesses across different workflows, and model selection per task is something we think about constantly. The rule we have settled on: Opus 4.7 for structured reasoning under defined conditions, Opus 4.6 for open-ended planning that needs judgment calls without a clear success condition.

Where 4.6 Still Wins

This is the part most launch coverage did not cover. Co-work sessions in claude.ai for open-ended planning, Opus 4.6 still performs better for us. We ran a direct comparison trying to plan a presentation using 4.7, then switching to 4.6 with the same information. The difference was significant enough that we wrote about it separately. The short version: 4.7's optimisation for structured tasks does not translate cleanly into freeform planning conversations where the model needs to make directional decisions on its own.

Read the full breakdown: Opus 4.7 on Co-Work Was a Train Smash. 4.6 Fixed It Immediately.

For lighter workloads, writing, research, and general client-facing chatbots, Sonnet 4.6 covers most of that at better cost efficiency. Opus 4.7 is not the default. It is the model you reach for when the task demands it.

Pricing and Availability

Same as Opus 4.6: $5 per million input tokens, $25 per million output tokens. No price increase alongside the performance gains. The model is live across all Claude.ai plans, the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. If you are already on 4.6 through any of those platforms, you can switch now with no waitlist and no migration work.

The Bottom Line

If you are doing serious coding, building automations, or running agentic tasks where reliability matters, the upgrade is worth making. The SWE-bench improvement is not marketing. The self-verification is genuinely useful in chained workflows.

If you are using Claude for general business tasks, Sonnet 4.6 is still the better cost-efficiency option for most of that. Opus is the heavy tool. Use it for heavy tasks.

If you are unsure what model setup fits your use case, we are happy to talk through it.