
Claude Opus 4.8 flags the bugs it writes four times more often than Opus 4.7
Anthropic's Opus 4.8 posts 69.2% on SWE-Bench Pro, lets code flaws slip 4x less often, and ships parallel subagents in Claude Code. Here's what matters.
Anthropic's top-tier Claude model. Successive Opus releases (4.5, 4.6, 4.7) trade places with GPT and Gemini on coding and reasoning benchmarks.

Anthropic's Opus 4.8 posts 69.2% on SWE-Bench Pro, lets code flaws slip 4x less often, and ships parallel subagents in Claude Code. Here's what matters.

On May 23 DeepSeek told customers the V4-Pro discount becomes its standard price after May 31. Output drops from $3.48 to $0.87 per million tokens.

A ChinaTalk investigation reveals how 'transfer stations' resell Anthropic API access using stolen credentials, model substitution, and prompt harvesting.

The DELEGATE-52 benchmark tests AI editing across 52 professional domains. Frontier models corrupt a quarter of document content over long workflows.

Anthropic doubled Claude Code's 5-hour limits, killed peak-hours throttling, and raised Opus API tiers. The capacity comes from xAI's Colossus 1, via a SpaceX deal.

GitHub's new model multiplier table for Copilot Pro and Pro+ annual plans lands June 1. Opus 4.6 goes 3 to 27. Sonnet 4.6 goes 1 to 9.

Boris Cherny on Claude Code, Applied AI on prompting, Erik Schluntz on vibe coding in prod. Three Code with Claude tapes hit YouTube ahead of the 2026 conference.

On June 1 every Copilot plan switches to GitHub AI Credits priced per token. Code completions stay free. Fallback models and credit rollover do not.

Arcee released Trinity-Large-Thinking on April 1: a 399B-param sparse MoE with 13B active, Apache 2.0 weights, $0.88 per million output tokens, and PinchBench just behind Opus 4.6.

OpenAI says SWE-bench Verified is saturated and contaminated, and 60% of remaining problems are unsolvable. Here's what comes next, and why every coding leaderboard is suspect.

DeepSeek shipped V4-Pro and V4-Flash under MIT on April 24. V4-Pro hits 80.6% on SWE-bench Verified. V4-Flash is $0.14 in / $0.28 out.

Anthropic's April 23 postmortem names three bugs that degraded Claude Code between March 4 and April 20. Usage limits are being reset for every subscriber.

OpenAI released GPT-5.5 (codename Spud) on April 23. The API runs at $5/$30 per million tokens, double GPT-5.4, with Pro at $30/$180.

GitHub froze Copilot Pro/Pro+/Student signups on April 20 and moved Claude Opus 4.7 behind the $39 Pro+ tier. Agent workflows broke the old math.

Axios reports the NSA is using Anthropic's unreleased Mythos model even though the Defense Department has blacklisted Anthropic. One government, two positions.

Anthropic launched Claude Design on April 17, a prompt-to-prototype tool that exports to Canva, not Figma. Figma's stock closed down 7% on the same day.

Alibaba's Qwen 3.6-35B-A3B is a 35B-param mixture-of-experts with only 3B active. Apache 2.0, runs on consumer GPUs, and it's already winning real tasks.

Anthropic's Opus 4.7 is state-of-the-art on SWE-bench and CursorBench, but independent tests show regressions on long-context retrieval and thematic reasoning.