devtake.dev

devtake.dev — #llmArticles tagged llm on devtake.dev.https://devtake.dev/en-usGemini Intelligence turns Android 17 into an agent that drives your appshttps://devtake.dev/article/google-gemini-intelligence-android-17/https://devtake.dev/article/google-gemini-intelligence-android-17/Google's Android Show pitched Gemini Intelligence and AppFunctions, an MCP-style way for the assistant to call inside your apps. Here's how it works and what to watch.Sat, 13 Jun 2026 14:00:00 GMTandroidgeminiandroidai-agentsllmgoogleagentsnaomi-parkRunning a coding agent fully on Apple Silicon, no cloud, is now an off-the-shelf stackhttps://devtake.dev/article/local-coding-agents-mac/https://devtake.dev/article/local-coding-agents-mac/A popular Hacker News how-to walked through a fully local coding agent on Apple Silicon. Here's the realistic 2026 stack: runner, model, and harness.Sat, 13 Jun 2026 12:30:00 GMTaiaillmlocal-inferenceai-agentsagentic-codingopen-weightsmacmoedieter-morelliClaude Fable 5 is Anthropic's first public Mythos-class model. It tops SWE-Bench Pro at 80.3%.https://devtake.dev/article/claude-fable-5-launch/https://devtake.dev/article/claude-fable-5-launch/Claude Fable 5 hits 80.3% on SWE-Bench Pro and ships on Bedrock and Copilot at $10/$50 per million tokens, free on paid plans only through June 22.Tue, 09 Jun 2026 18:55:00 GMTaiai-modelsanthropicclaudeclaude-mythosbenchmarksllmagentic-codingdieter-morelliOpenAI added a Lockdown Mode to ChatGPT to blunt prompt-injection attackshttps://devtake.dev/article/openai-lockdown-mode-prompt-injection/https://devtake.dev/article/openai-lockdown-mode-prompt-injection/OpenAI shipped Lockdown Mode in ChatGPT to cut off the data-exfiltration step of prompt-injection attacks. Here's what it actually restricts and who should turn it on.Mon, 08 Jun 2026 10:15:00 GMTaiopenaiai-securityprompt-injectionllmai-agentsdieter-morelliOpenAI is putting Codex in every ChatGPT app, with six business plugins for non-codershttps://devtake.dev/article/openai-codex-chatgpt-everywhere/https://devtake.dev/article/openai-codex-chatgpt-everywhere/On June 2 OpenAI said Codex is coming to the ChatGPT app everywhere within weeks, and shipped six role-specific plugins for sales, analytics, design, and finance teams.Wed, 03 Jun 2026 12:30:00 GMTaiopenaicodexchatgptai-agentsagentic-codingai-assistantautomationllmdieter-morelliStanford tested AI against law professors. The pros picked the AI 75% of the time.https://devtake.dev/article/stanford-ai-beats-law-professors/https://devtake.dev/article/stanford-ai-beats-law-professors/A blinded Stanford Law study had 16 professors grade AI tutoring answers against their own. Here's what the 75% win rate actually measures, and what it doesn't.Wed, 03 Jun 2026 11:15:00 GMTaiaillmbenchmarkslegal-aiai-modelsgeminiragai-evaldieter-morelliClaude Opus 4.8 flags the bugs it writes four times more often than Opus 4.7https://devtake.dev/article/claude-opus-4-8-launch/https://devtake.dev/article/claude-opus-4-8-launch/Anthropic's Opus 4.8 posts 69.2% on SWE-Bench Pro, lets code flaws slip 4x less often, and ships parallel subagents in Claude Code. Here's what matters.Fri, 29 May 2026 07:20:00 GMTaiai-modelsanthropicclaudellmbenchmarksagentic-codingclaude-codeopus-4-7dieter-morelliSQLite won't accept AI-written code, but QEMU just opened the door to ithttps://devtake.dev/article/sqlite-refuses-agentic-code-qemu-opens-door/https://devtake.dev/article/sqlite-refuses-agentic-code-qemu-opens-door/Two of the most cautious C projects split on AI contributions in the same week. The real fight is over copyright provenance and who cleans up the slop.Fri, 29 May 2026 05:35:00 GMTopen-sourceopen-sourcesqliteqemuai-codingagentic-codingmaintainerslicensingllmsoren-vanekHacker News is obsessed with durable Postgres workflows and a game about clicking yeshttps://devtake.dev/article/dev-tools-trending-digest-may-2026/https://devtake.dev/article/dev-tools-trending-digest-may-2026/Six dev-tooling and AI posts that climbed Hacker News in late May 2026: durable execution on plain Postgres, LLM code smells, a permission-fatigue game, Rust 1.96, and more.Fri, 29 May 2026 05:05:00 GMTaidev-toolshackernewsllmrustpostgresai-agentsopen-sourcedieter-morelliDeepSeek locked in the 75% V4-Pro cut. The API now undercuts every Western frontier model.https://devtake.dev/article/deepseek-v4-pro-price-cut-permanent/https://devtake.dev/article/deepseek-v4-pro-price-cut-permanent/On May 23 DeepSeek told customers the V4-Pro discount becomes its standard price after May 31. Output drops from $3.48 to $0.87 per million tokens.Sun, 24 May 2026 10:30:00 GMTaideepseekai-modelsllmanthropicopenaigeminiai-chipschinadieter-morelliAndrej Karpathy joined Anthropic. The OpenAI founding member's job: use Claude to train Claude.https://devtake.dev/article/karpathy-joins-anthropic-pretraining/https://devtake.dev/article/karpathy-joins-anthropic-pretraining/Karpathy started this week at Anthropic on Nick Joseph's pre-training team. His mandate is using Claude to accelerate Claude's own training.Thu, 21 May 2026 12:00:00 GMTaianthropicopenaiclaudeandrej-karpathyai-modelsllmpre-trainingai-talentdieter-morelliA crafted Ollama model file leaks the whole server's memory. 300,000 instances are exposed.https://devtake.dev/article/ollama-bleeding-llama-cve-2026-7482/https://devtake.dev/article/ollama-bleeding-llama-cve-2026-7482/Cyera disclosed CVE-2026-7482 on May 1, a CVSS 9.1 unauthenticated heap read in Ollama. Three API calls dump prompts, env vars, and API keys from any open instance.Mon, 11 May 2026 10:00:00 GMTsecuritysecurityollamallmcve-2026-7482local-inferencememorycyeraai-securityluca-reinhardtMicrosoft tested 19 LLMs as document editors. Even the best ones corrupted 25% of the content.https://devtake.dev/article/llms-corrupt-documents-delegation-errors/https://devtake.dev/article/llms-corrupt-documents-delegation-errors/The DELEGATE-52 benchmark tests AI editing across 52 professional domains. Frontier models corrupt a quarter of document content over long workflows.Sun, 10 May 2026 09:00:00 GMTaillmai-modelsbenchmarksmicrosoftdelegationvibe-codingdieter-morelliTimothy Gowers gave GPT 5.5 an open math problem. It returned a novel proof in 17 minutes.https://devtake.dev/article/fields-medal-gowers-gpt-open-problems/https://devtake.dev/article/fields-medal-gowers-gpt-open-problems/The 1998 Fields Medal winner reports GPT 5.5 Pro produced a novel proof for an unsolved math problem in 17 minutes, and says the era of owning theorems is ending.Sat, 09 May 2026 07:30:00 GMTaiopenaillmai-modelsbenchmarksdieter-morelliMicrosoft and OpenAI just rewrote their deal. Exclusivity is dead, and so is the AGI clause.https://devtake.dev/article/microsoft-openai-deal-revenue-share-end/https://devtake.dev/article/microsoft-openai-deal-revenue-share-end/Microsoft loses exclusive rights to OpenAI's models. The revenue share now caps at 2030 and stops depending on AGI. Here's what actually changed and who it benefits.Mon, 27 Apr 2026 19:00:00 GMTaiopenaimicrosoftai-modelsazurellmai-infrastructureanthropicgpt-5-5dieter-morelliArcee's Trinity-Large-Thinking is a 399B open MoE that costs 96% less than Opushttps://devtake.dev/article/arcee-trinity-large-thinking-reasoning/https://devtake.dev/article/arcee-trinity-large-thinking-reasoning/Arcee released Trinity-Large-Thinking on April 1: a 399B-param sparse MoE with 13B active, Apache 2.0 weights, $0.88 per million output tokens, and PinchBench just behind Opus 4.6.Mon, 27 Apr 2026 13:00:00 GMTopen-sourcearceetrinityllmai-modelsopen-weightsmoereasoningapache-2-0soren-vanekA malicious GGUF file owns your SGLang server: CVE-2026-5760 is an unpatched 9.8https://devtake.dev/article/sglang-cve-2026-5760-gguf-rce/https://devtake.dev/article/sglang-cve-2026-5760-gguf-rce/SGLang's reranker renders chat templates without a sandbox. Load a hostile GGUF, hit /v1/rerank, and the attacker has Python on your inference box. No patch yet.Mon, 27 Apr 2026 11:30:00 GMTsecuritysglangcve-2026-5760supply-chainai-securityllmrcejinja2ggufluca-reinhardtOpenAI just retired SWE-bench Verified. The headline coding benchmark of 2025 is officially saturated.https://devtake.dev/article/openai-retires-swe-bench-verified/https://devtake.dev/article/openai-retires-swe-bench-verified/OpenAI says SWE-bench Verified is saturated and contaminated, and 60% of remaining problems are unsolvable. Here's what comes next, and why every coding leaderboard is suspect.Mon, 27 Apr 2026 10:00:00 GMTaiopenaiswe-benchbenchmarksai-modelsllmai-codingevaluationsclaude-opusdieter-morelliOpenAI's Privacy Filter is a 1.5B PII redactor that ships under Apache 2.0. Here's what it actually does.https://devtake.dev/article/openai-privacy-filter/https://devtake.dev/article/openai-privacy-filter/OpenAI released Privacy Filter on April 22 as an open-weight on-device model for masking eight types of PII. F1 of 96%. Runs in a browser. Here's the catch.Sun, 26 Apr 2026 13:00:00 GMTaiopenaiprivacypiiopen-weightsai-modelsllmhugging-facedata-privacydieter-morelliAn AI agent built a working RISC-V CPU from a 219-word prompt in 12 hours. Here's what it actually did.https://devtake.dev/article/ai-agent-risc-v-cpu-design/https://devtake.dev/article/ai-agent-risc-v-cpu-design/Verkor's Design Conductor agent went from a 219-word spec to a tape-out-ready RISC-V core called VerCore in 12 hours. The catch: it's still a Celeron.Sat, 25 Apr 2026 13:00:00 GMTaiai-agentsautomationrisc-vchip-designllmhardwareedasemiconductordieter-morelliMalicious npm and PyPI packages turn dev servers into Chinese LLM proxieshttps://devtake.dev/article/gpt-proxy-npm-supply-chain/https://devtake.dev/article/gpt-proxy-npm-supply-chain/Aikido found a stage-2 Go binary inside two health-check-themed packages that runs an OpenAI-compatible router routing Claude, GPT, and Gemini traffic through Chinese aggregators.Sat, 25 Apr 2026 07:30:00 GMTsecuritysupply-chainnpmpypiai-securitymalwarellmchinacredential-theftluca-reinhardtDeepSeek V4 lands: 1.6T-param open MoE, 1M-token context, and SWE-bench within 0.2 of Opus 4.6https://devtake.dev/article/deepseek-v4-release/https://devtake.dev/article/deepseek-v4-release/DeepSeek shipped V4-Pro and V4-Flash under MIT on April 24. V4-Pro hits 80.6% on SWE-bench Verified. V4-Flash is $0.14 in / $0.28 out.Fri, 24 Apr 2026 21:30:00 GMTaideepseekdeepseek-v4llmai-modelsopen-weightsmoebenchmarksopen-sourcedieter-morelliOpenAI shipped GPT-5.5 seven weeks after 5.4. API tokens now cost twice as much.https://devtake.dev/article/openai-gpt-5-5-launch/https://devtake.dev/article/openai-gpt-5-5-launch/OpenAI released GPT-5.5 (codename Spud) on April 23. The API runs at $5/$30 per million tokens, double GPT-5.4, with Pro at $30/$180.Thu, 23 Apr 2026 18:30:00 GMTaiopenaigpt-5-5chatgptcodexai-modelsapi-pricingllmagentic-aidieter-morelliCloudflare open-sourced a lossless LLM compressor that shaves 22% off model weightshttps://devtake.dev/article/cloudflare-unweight-lossless-llm-compression/https://devtake.dev/article/cloudflare-unweight-lossless-llm-compression/Unweight is Cloudflare Research's new BF16 weight compressor. 22% smaller bundles, 13% smaller inference footprint, 30-40% throughput overhead, BSD license.Sun, 19 Apr 2026 12:00:00 GMTopen-sourcecloudflareunweightllmcompressionbf16huffmanh100open-sourcesoren-vanekQwen 3.6-35B-A3B: the open MoE beating Opus 4.7 on Simon Willison's laptophttps://devtake.dev/article/qwen-3-6-35b-a3b-beats-opus-on-laptop/https://devtake.dev/article/qwen-3-6-35b-a3b-beats-opus-on-laptop/Alibaba's Qwen 3.6-35B-A3B is a 35B-param mixture-of-experts with only 3B active. Apache 2.0, runs on consumer GPUs, and it's already winning real tasks.Fri, 17 Apr 2026 10:00:00 GMTaiqwenalibabaopen-sourcemoellmlocal-inferenceopen-weightsdieter-morelliClaude Opus 4.7 is here, and the long-context benchmarks got worsehttps://devtake.dev/article/anthropic-claude-opus-4-7-launch/https://devtake.dev/article/anthropic-claude-opus-4-7-launch/Anthropic's Opus 4.7 is state-of-the-art on SWE-bench and CursorBench, but independent tests show regressions on long-context retrieval and thematic reasoning.Fri, 17 Apr 2026 09:30:00 GMTaiclaudeanthropicopus-4-7llmbenchmarksmythosai-modelsdieter-morelli