devtake.dev
Product

Claude Opus

RSS

Anthropic's top-tier Claude model. Successive Opus releases (4.5, 4.6, 4.7) trade places with GPT and Gemini on coding and reasoning benchmarks.

18 articles First covered Apr 17, 2026, latest May 29, 2026
Anthropic's announcement artwork for Claude Opus 4.8, a soft gradient panel with the Claude wordmark.
AI·

Claude Opus 4.8 flags the bugs it writes four times more often than Opus 4.7

Anthropic's Opus 4.8 posts 69.2% on SWE-Bench Pro, lets code flaws slip 4x less often, and ships parallel subagents in Claude Code. Here's what matters.

DeepSeek social card with the company's wordmark on a navy background
AI·

DeepSeek locked in the 75% V4-Pro cut. The API now undercuts every Western frontier model.

On May 23 DeepSeek told customers the V4-Pro discount becomes its standard price after May 31. Output drops from $3.48 to $0.87 per million tokens.

Illustration accompanying ChinaTalk's investigation into grey-market Claude API proxy networks
AI·

Chinese proxy networks sell Claude API access at 90% off. They harvest every prompt that passes through.

A ChinaTalk investigation reveals how 'transfer stations' resell Anthropic API access using stolen credentials, model substitution, and prompt harvesting.

The DELEGATE-52 project repository on GitHub, showing Microsoft's benchmark for testing LLM document editing fidelity
AI·

Microsoft tested 19 LLMs as document editors. Even the best ones corrupted 25% of the content.

The DELEGATE-52 benchmark tests AI editing across 52 professional domains. Frontier models corrupt a quarter of document content over long workflows.

Cartoon Claude Code terminal flexing two muscular arms against a terracotta background
AI·

Anthropic doubled Claude Code's limits by renting 220,000 GPUs from xAI

Anthropic doubled Claude Code's 5-hour limits, killed peak-hours throttling, and raised Opus API tiers. The capacity comes from xAI's Colossus 1, via a SpaceX deal.

Stylized GitHub Copilot mascot melting into glowing puddles in front of a wall of flames — a visual metaphor for the steep multiplier hike on annual plans.
AI·

GitHub Copilot's Claude Opus multiplier jumps to 27x on June 1. Monthly plans dodge the hike.

GitHub's new model multiplier table for Copilot Pro and Pro+ annual plans lands June 1. Opus 4.6 goes 3 to 27. Sonnet 4.6 goes 1 to 9.

Title card for Boris Cherny's 'Mastering Claude Code in 30 Minutes' Anthropic workshop talk.
AI·

Anthropic just dropped its Claude Code workshop tapes. The playbook is better than the marketing.

Boris Cherny on Claude Code, Applied AI on prompting, Erik Schluntz on vibe coding in prod. Three Code with Claude tapes hit YouTube ahead of the 2026 conference.

GitHub Octocat mark on a dark gradient, the cover graphic on the GitHub Blog post announcing the Copilot billing change.
AI·

GitHub Copilot kills premium requests on June 1. Token billing arrives, fallback models do not.

On June 1 every Copilot plan switches to GitHub AI Credits priced per token. Code completions stay free. Fallback models and credit rollover do not.

Arcee AI Trinity branding from the Trinity-Large-Thinking blog post.
Open Source·

Arcee's Trinity-Large-Thinking is a 399B open MoE that costs 96% less than Opus

Arcee released Trinity-Large-Thinking on April 1: a 399B-param sparse MoE with 13B active, Apache 2.0 weights, $0.88 per million output tokens, and PinchBench just behind Opus 4.6.

OpenAI just retired SWE-bench Verified. The headline coding benchmark of 2025 is officially saturated.
AI·

OpenAI just retired SWE-bench Verified. The headline coding benchmark of 2025 is officially saturated.

OpenAI says SWE-bench Verified is saturated and contaminated, and 60% of remaining problems are unsolvable. Here's what comes next, and why every coding leaderboard is suspect.

DeepSeek social card from the V4 API documentation release post.
AI·

DeepSeek V4 lands: 1.6T-param open MoE, 1M-token context, and SWE-bench within 0.2 of Opus 4.6

DeepSeek shipped V4-Pro and V4-Flash under MIT on April 24. V4-Pro hits 80.6% on SWE-bench Verified. V4-Flash is $0.14 in / $0.28 out.

Anthropic Engineering postmortem cover image.
AI·

Anthropic admits three Claude Code bugs quietly tanked quality for six weeks

Anthropic's April 23 postmortem names three bugs that degraded Claude Code between March 4 and April 20. Usage limits are being reset for every subscriber.

OpenAI's GPT-5.5 model launch with ChatGPT and Codex interfaces
AI·

OpenAI shipped GPT-5.5 seven weeks after 5.4. API tokens now cost twice as much.

OpenAI released GPT-5.5 (codename Spud) on April 23. The API runs at $5/$30 per million tokens, double GPT-5.4, with Pro at $30/$180.

GitHub Copilot announcement cover graphic
AI·

GitHub Copilot paused new signups and kicked Opus out of Pro. Here's what actually changed.

GitHub froze Copilot Pro/Pro+/Student signups on April 20 and moved Claude Opus 4.7 behind the $39 Pro+ tier. Agent workflows broke the old math.

Illustration for Anthropic's Project Glasswing, a cybersecurity program powered by Claude Mythos Preview
AI·

NSA is running Anthropic's Mythos. The Pentagon says Anthropic is a supply-chain risk.

Axios reports the NSA is using Anthropic's unreleased Mythos model even though the Defense Department has blacklisted Anthropic. One government, two positions.

Anthropic's Claude Design announcement illustration, a quill on a cactus-green background
AI·

Anthropic shipped Claude Design. Figma stock dropped 7% the same day.

Anthropic launched Claude Design on April 17, a prompt-to-prototype tool that exports to Canva, not Figma. Figma's stock closed down 7% on the same day.

Header card from Simon Willison's 'Qwen3.6 beats Opus' post comparing pelican SVGs
AI·

Qwen 3.6-35B-A3B: the open MoE beating Opus 4.7 on Simon Willison's laptop

Alibaba's Qwen 3.6-35B-A3B is a 35B-param mixture-of-experts with only 3B active. Apache 2.0, runs on consumer GPUs, and it's already winning real tasks.

Claude Opus 4.7 launch artwork from the Anthropic news post
AI·

Claude Opus 4.7 is here, and the long-context benchmarks got worse

Anthropic's Opus 4.7 is state-of-the-art on SWE-bench and CursorBench, but independent tests show regressions on long-context retrieval and thematic reasoning.