<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>devtake.dev — #llm</title><description>Articles tagged llm on devtake.dev.</description><link>https://devtake.dev/</link><language>en-us</language><item><title>Gemini Intelligence turns Android 17 into an agent that drives your apps</title><link>https://devtake.dev/article/google-gemini-intelligence-android-17/</link><guid isPermaLink="true">https://devtake.dev/article/google-gemini-intelligence-android-17/</guid><description>Google&apos;s Android Show pitched Gemini Intelligence and AppFunctions, an MCP-style way for the assistant to call inside your apps. Here&apos;s how it works and what to watch.</description><pubDate>Sat, 13 Jun 2026 14:00:00 GMT</pubDate><category>android</category><category>gemini</category><category>android</category><category>ai-agents</category><category>llm</category><category>google</category><category>agents</category><author>naomi-park</author></item><item><title>Running a coding agent fully on Apple Silicon, no cloud, is now an off-the-shelf stack</title><link>https://devtake.dev/article/local-coding-agents-mac/</link><guid isPermaLink="true">https://devtake.dev/article/local-coding-agents-mac/</guid><description>A popular Hacker News how-to walked through a fully local coding agent on Apple Silicon. Here&apos;s the realistic 2026 stack: runner, model, and harness.</description><pubDate>Sat, 13 Jun 2026 12:30:00 GMT</pubDate><category>ai</category><category>ai</category><category>llm</category><category>local-inference</category><category>ai-agents</category><category>agentic-coding</category><category>open-weights</category><category>mac</category><category>moe</category><author>dieter-morelli</author></item><item><title>Claude Fable 5 is Anthropic&apos;s first public Mythos-class model. It tops SWE-Bench Pro at 80.3%.</title><link>https://devtake.dev/article/claude-fable-5-launch/</link><guid isPermaLink="true">https://devtake.dev/article/claude-fable-5-launch/</guid><description>Claude Fable 5 hits 80.3% on SWE-Bench Pro and ships on Bedrock and Copilot at $10/$50 per million tokens, free on paid plans only through June 22.</description><pubDate>Tue, 09 Jun 2026 18:55:00 GMT</pubDate><category>ai</category><category>ai-models</category><category>anthropic</category><category>claude</category><category>claude-mythos</category><category>benchmarks</category><category>llm</category><category>agentic-coding</category><author>dieter-morelli</author></item><item><title>OpenAI added a Lockdown Mode to ChatGPT to blunt prompt-injection attacks</title><link>https://devtake.dev/article/openai-lockdown-mode-prompt-injection/</link><guid isPermaLink="true">https://devtake.dev/article/openai-lockdown-mode-prompt-injection/</guid><description>OpenAI shipped Lockdown Mode in ChatGPT to cut off the data-exfiltration step of prompt-injection attacks. Here&apos;s what it actually restricts and who should turn it on.</description><pubDate>Mon, 08 Jun 2026 10:15:00 GMT</pubDate><category>ai</category><category>openai</category><category>ai-security</category><category>prompt-injection</category><category>llm</category><category>ai-agents</category><author>dieter-morelli</author></item><item><title>OpenAI is putting Codex in every ChatGPT app, with six business plugins for non-coders</title><link>https://devtake.dev/article/openai-codex-chatgpt-everywhere/</link><guid isPermaLink="true">https://devtake.dev/article/openai-codex-chatgpt-everywhere/</guid><description>On June 2 OpenAI said Codex is coming to the ChatGPT app everywhere within weeks, and shipped six role-specific plugins for sales, analytics, design, and finance teams.</description><pubDate>Wed, 03 Jun 2026 12:30:00 GMT</pubDate><category>ai</category><category>openai</category><category>codex</category><category>chatgpt</category><category>ai-agents</category><category>agentic-coding</category><category>ai-assistant</category><category>automation</category><category>llm</category><author>dieter-morelli</author></item><item><title>Stanford tested AI against law professors. The pros picked the AI 75% of the time.</title><link>https://devtake.dev/article/stanford-ai-beats-law-professors/</link><guid isPermaLink="true">https://devtake.dev/article/stanford-ai-beats-law-professors/</guid><description>A blinded Stanford Law study had 16 professors grade AI tutoring answers against their own. Here&apos;s what the 75% win rate actually measures, and what it doesn&apos;t.</description><pubDate>Wed, 03 Jun 2026 11:15:00 GMT</pubDate><category>ai</category><category>ai</category><category>llm</category><category>benchmarks</category><category>legal-ai</category><category>ai-models</category><category>gemini</category><category>rag</category><category>ai-eval</category><author>dieter-morelli</author></item><item><title>Claude Opus 4.8 flags the bugs it writes four times more often than Opus 4.7</title><link>https://devtake.dev/article/claude-opus-4-8-launch/</link><guid isPermaLink="true">https://devtake.dev/article/claude-opus-4-8-launch/</guid><description>Anthropic&apos;s Opus 4.8 posts 69.2% on SWE-Bench Pro, lets code flaws slip 4x less often, and ships parallel subagents in Claude Code. Here&apos;s what matters.</description><pubDate>Fri, 29 May 2026 07:20:00 GMT</pubDate><category>ai</category><category>ai-models</category><category>anthropic</category><category>claude</category><category>llm</category><category>benchmarks</category><category>agentic-coding</category><category>claude-code</category><category>opus-4-7</category><author>dieter-morelli</author></item><item><title>SQLite won&apos;t accept AI-written code, but QEMU just opened the door to it</title><link>https://devtake.dev/article/sqlite-refuses-agentic-code-qemu-opens-door/</link><guid isPermaLink="true">https://devtake.dev/article/sqlite-refuses-agentic-code-qemu-opens-door/</guid><description>Two of the most cautious C projects split on AI contributions in the same week. The real fight is over copyright provenance and who cleans up the slop.</description><pubDate>Fri, 29 May 2026 05:35:00 GMT</pubDate><category>open-source</category><category>open-source</category><category>sqlite</category><category>qemu</category><category>ai-coding</category><category>agentic-coding</category><category>maintainers</category><category>licensing</category><category>llm</category><author>soren-vanek</author></item><item><title>Hacker News is obsessed with durable Postgres workflows and a game about clicking yes</title><link>https://devtake.dev/article/dev-tools-trending-digest-may-2026/</link><guid isPermaLink="true">https://devtake.dev/article/dev-tools-trending-digest-may-2026/</guid><description>Six dev-tooling and AI posts that climbed Hacker News in late May 2026: durable execution on plain Postgres, LLM code smells, a permission-fatigue game, Rust 1.96, and more.</description><pubDate>Fri, 29 May 2026 05:05:00 GMT</pubDate><category>ai</category><category>dev-tools</category><category>hackernews</category><category>llm</category><category>rust</category><category>postgres</category><category>ai-agents</category><category>open-source</category><author>dieter-morelli</author></item><item><title>DeepSeek locked in the 75% V4-Pro cut. The API now undercuts every Western frontier model.</title><link>https://devtake.dev/article/deepseek-v4-pro-price-cut-permanent/</link><guid isPermaLink="true">https://devtake.dev/article/deepseek-v4-pro-price-cut-permanent/</guid><description>On May 23 DeepSeek told customers the V4-Pro discount becomes its standard price after May 31. Output drops from $3.48 to $0.87 per million tokens.</description><pubDate>Sun, 24 May 2026 10:30:00 GMT</pubDate><category>ai</category><category>deepseek</category><category>ai-models</category><category>llm</category><category>anthropic</category><category>openai</category><category>gemini</category><category>ai-chips</category><category>china</category><author>dieter-morelli</author></item><item><title>Andrej Karpathy joined Anthropic. The OpenAI founding member&apos;s job: use Claude to train Claude.</title><link>https://devtake.dev/article/karpathy-joins-anthropic-pretraining/</link><guid isPermaLink="true">https://devtake.dev/article/karpathy-joins-anthropic-pretraining/</guid><description>Karpathy started this week at Anthropic on Nick Joseph&apos;s pre-training team. His mandate is using Claude to accelerate Claude&apos;s own training.</description><pubDate>Thu, 21 May 2026 12:00:00 GMT</pubDate><category>ai</category><category>anthropic</category><category>openai</category><category>claude</category><category>andrej-karpathy</category><category>ai-models</category><category>llm</category><category>pre-training</category><category>ai-talent</category><author>dieter-morelli</author></item><item><title>A crafted Ollama model file leaks the whole server&apos;s memory. 300,000 instances are exposed.</title><link>https://devtake.dev/article/ollama-bleeding-llama-cve-2026-7482/</link><guid isPermaLink="true">https://devtake.dev/article/ollama-bleeding-llama-cve-2026-7482/</guid><description>Cyera disclosed CVE-2026-7482 on May 1, a CVSS 9.1 unauthenticated heap read in Ollama. Three API calls dump prompts, env vars, and API keys from any open instance.</description><pubDate>Mon, 11 May 2026 10:00:00 GMT</pubDate><category>security</category><category>security</category><category>ollama</category><category>llm</category><category>cve-2026-7482</category><category>local-inference</category><category>memory</category><category>cyera</category><category>ai-security</category><author>luca-reinhardt</author></item><item><title>Microsoft tested 19 LLMs as document editors. Even the best ones corrupted 25% of the content.</title><link>https://devtake.dev/article/llms-corrupt-documents-delegation-errors/</link><guid isPermaLink="true">https://devtake.dev/article/llms-corrupt-documents-delegation-errors/</guid><description>The DELEGATE-52 benchmark tests AI editing across 52 professional domains. Frontier models corrupt a quarter of document content over long workflows.</description><pubDate>Sun, 10 May 2026 09:00:00 GMT</pubDate><category>ai</category><category>llm</category><category>ai-models</category><category>benchmarks</category><category>microsoft</category><category>delegation</category><category>vibe-coding</category><author>dieter-morelli</author></item><item><title>Timothy Gowers gave GPT 5.5 an open math problem. It returned a novel proof in 17 minutes.</title><link>https://devtake.dev/article/fields-medal-gowers-gpt-open-problems/</link><guid isPermaLink="true">https://devtake.dev/article/fields-medal-gowers-gpt-open-problems/</guid><description>The 1998 Fields Medal winner reports GPT 5.5 Pro produced a novel proof for an unsolved math problem in 17 minutes, and says the era of owning theorems is ending.</description><pubDate>Sat, 09 May 2026 07:30:00 GMT</pubDate><category>ai</category><category>openai</category><category>llm</category><category>ai-models</category><category>benchmarks</category><author>dieter-morelli</author></item><item><title>Microsoft and OpenAI just rewrote their deal. Exclusivity is dead, and so is the AGI clause.</title><link>https://devtake.dev/article/microsoft-openai-deal-revenue-share-end/</link><guid isPermaLink="true">https://devtake.dev/article/microsoft-openai-deal-revenue-share-end/</guid><description>Microsoft loses exclusive rights to OpenAI&apos;s models. The revenue share now caps at 2030 and stops depending on AGI. Here&apos;s what actually changed and who it benefits.</description><pubDate>Mon, 27 Apr 2026 19:00:00 GMT</pubDate><category>ai</category><category>openai</category><category>microsoft</category><category>ai-models</category><category>azure</category><category>llm</category><category>ai-infrastructure</category><category>anthropic</category><category>gpt-5-5</category><author>dieter-morelli</author></item><item><title>Arcee&apos;s Trinity-Large-Thinking is a 399B open MoE that costs 96% less than Opus</title><link>https://devtake.dev/article/arcee-trinity-large-thinking-reasoning/</link><guid isPermaLink="true">https://devtake.dev/article/arcee-trinity-large-thinking-reasoning/</guid><description>Arcee released Trinity-Large-Thinking on April 1: a 399B-param sparse MoE with 13B active, Apache 2.0 weights, $0.88 per million output tokens, and PinchBench just behind Opus 4.6.</description><pubDate>Mon, 27 Apr 2026 13:00:00 GMT</pubDate><category>open-source</category><category>arcee</category><category>trinity</category><category>llm</category><category>ai-models</category><category>open-weights</category><category>moe</category><category>reasoning</category><category>apache-2-0</category><author>soren-vanek</author></item><item><title>A malicious GGUF file owns your SGLang server: CVE-2026-5760 is an unpatched 9.8</title><link>https://devtake.dev/article/sglang-cve-2026-5760-gguf-rce/</link><guid isPermaLink="true">https://devtake.dev/article/sglang-cve-2026-5760-gguf-rce/</guid><description>SGLang&apos;s reranker renders chat templates without a sandbox. Load a hostile GGUF, hit /v1/rerank, and the attacker has Python on your inference box. No patch yet.</description><pubDate>Mon, 27 Apr 2026 11:30:00 GMT</pubDate><category>security</category><category>sglang</category><category>cve-2026-5760</category><category>supply-chain</category><category>ai-security</category><category>llm</category><category>rce</category><category>jinja2</category><category>gguf</category><author>luca-reinhardt</author></item><item><title>OpenAI just retired SWE-bench Verified. The headline coding benchmark of 2025 is officially saturated.</title><link>https://devtake.dev/article/openai-retires-swe-bench-verified/</link><guid isPermaLink="true">https://devtake.dev/article/openai-retires-swe-bench-verified/</guid><description>OpenAI says SWE-bench Verified is saturated and contaminated, and 60% of remaining problems are unsolvable. Here&apos;s what comes next, and why every coding leaderboard is suspect.</description><pubDate>Mon, 27 Apr 2026 10:00:00 GMT</pubDate><category>ai</category><category>openai</category><category>swe-bench</category><category>benchmarks</category><category>ai-models</category><category>llm</category><category>ai-coding</category><category>evaluations</category><category>claude-opus</category><author>dieter-morelli</author></item><item><title>OpenAI&apos;s Privacy Filter is a 1.5B PII redactor that ships under Apache 2.0. Here&apos;s what it actually does.</title><link>https://devtake.dev/article/openai-privacy-filter/</link><guid isPermaLink="true">https://devtake.dev/article/openai-privacy-filter/</guid><description>OpenAI released Privacy Filter on April 22 as an open-weight on-device model for masking eight types of PII. F1 of 96%. Runs in a browser. Here&apos;s the catch.</description><pubDate>Sun, 26 Apr 2026 13:00:00 GMT</pubDate><category>ai</category><category>openai</category><category>privacy</category><category>pii</category><category>open-weights</category><category>ai-models</category><category>llm</category><category>hugging-face</category><category>data-privacy</category><author>dieter-morelli</author></item><item><title>An AI agent built a working RISC-V CPU from a 219-word prompt in 12 hours. Here&apos;s what it actually did.</title><link>https://devtake.dev/article/ai-agent-risc-v-cpu-design/</link><guid isPermaLink="true">https://devtake.dev/article/ai-agent-risc-v-cpu-design/</guid><description>Verkor&apos;s Design Conductor agent went from a 219-word spec to a tape-out-ready RISC-V core called VerCore in 12 hours. The catch: it&apos;s still a Celeron.</description><pubDate>Sat, 25 Apr 2026 13:00:00 GMT</pubDate><category>ai</category><category>ai-agents</category><category>automation</category><category>risc-v</category><category>chip-design</category><category>llm</category><category>hardware</category><category>eda</category><category>semiconductor</category><author>dieter-morelli</author></item><item><title>Malicious npm and PyPI packages turn dev servers into Chinese LLM proxies</title><link>https://devtake.dev/article/gpt-proxy-npm-supply-chain/</link><guid isPermaLink="true">https://devtake.dev/article/gpt-proxy-npm-supply-chain/</guid><description>Aikido found a stage-2 Go binary inside two health-check-themed packages that runs an OpenAI-compatible router routing Claude, GPT, and Gemini traffic through Chinese aggregators.</description><pubDate>Sat, 25 Apr 2026 07:30:00 GMT</pubDate><category>security</category><category>supply-chain</category><category>npm</category><category>pypi</category><category>ai-security</category><category>malware</category><category>llm</category><category>china</category><category>credential-theft</category><author>luca-reinhardt</author></item><item><title>DeepSeek V4 lands: 1.6T-param open MoE, 1M-token context, and SWE-bench within 0.2 of Opus 4.6</title><link>https://devtake.dev/article/deepseek-v4-release/</link><guid isPermaLink="true">https://devtake.dev/article/deepseek-v4-release/</guid><description>DeepSeek shipped V4-Pro and V4-Flash under MIT on April 24. V4-Pro hits 80.6% on SWE-bench Verified. V4-Flash is $0.14 in / $0.28 out.</description><pubDate>Fri, 24 Apr 2026 21:30:00 GMT</pubDate><category>ai</category><category>deepseek</category><category>deepseek-v4</category><category>llm</category><category>ai-models</category><category>open-weights</category><category>moe</category><category>benchmarks</category><category>open-source</category><author>dieter-morelli</author></item><item><title>OpenAI shipped GPT-5.5 seven weeks after 5.4. API tokens now cost twice as much.</title><link>https://devtake.dev/article/openai-gpt-5-5-launch/</link><guid isPermaLink="true">https://devtake.dev/article/openai-gpt-5-5-launch/</guid><description>OpenAI released GPT-5.5 (codename Spud) on April 23. The API runs at $5/$30 per million tokens, double GPT-5.4, with Pro at $30/$180.</description><pubDate>Thu, 23 Apr 2026 18:30:00 GMT</pubDate><category>ai</category><category>openai</category><category>gpt-5-5</category><category>chatgpt</category><category>codex</category><category>ai-models</category><category>api-pricing</category><category>llm</category><category>agentic-ai</category><author>dieter-morelli</author></item><item><title>Cloudflare open-sourced a lossless LLM compressor that shaves 22% off model weights</title><link>https://devtake.dev/article/cloudflare-unweight-lossless-llm-compression/</link><guid isPermaLink="true">https://devtake.dev/article/cloudflare-unweight-lossless-llm-compression/</guid><description>Unweight is Cloudflare Research&apos;s new BF16 weight compressor. 22% smaller bundles, 13% smaller inference footprint, 30-40% throughput overhead, BSD license.</description><pubDate>Sun, 19 Apr 2026 12:00:00 GMT</pubDate><category>open-source</category><category>cloudflare</category><category>unweight</category><category>llm</category><category>compression</category><category>bf16</category><category>huffman</category><category>h100</category><category>open-source</category><author>soren-vanek</author></item><item><title>Qwen 3.6-35B-A3B: the open MoE beating Opus 4.7 on Simon Willison&apos;s laptop</title><link>https://devtake.dev/article/qwen-3-6-35b-a3b-beats-opus-on-laptop/</link><guid isPermaLink="true">https://devtake.dev/article/qwen-3-6-35b-a3b-beats-opus-on-laptop/</guid><description>Alibaba&apos;s Qwen 3.6-35B-A3B is a 35B-param mixture-of-experts with only 3B active. Apache 2.0, runs on consumer GPUs, and it&apos;s already winning real tasks.</description><pubDate>Fri, 17 Apr 2026 10:00:00 GMT</pubDate><category>ai</category><category>qwen</category><category>alibaba</category><category>open-source</category><category>moe</category><category>llm</category><category>local-inference</category><category>open-weights</category><author>dieter-morelli</author></item><item><title>Claude Opus 4.7 is here, and the long-context benchmarks got worse</title><link>https://devtake.dev/article/anthropic-claude-opus-4-7-launch/</link><guid isPermaLink="true">https://devtake.dev/article/anthropic-claude-opus-4-7-launch/</guid><description>Anthropic&apos;s Opus 4.7 is state-of-the-art on SWE-bench and CursorBench, but independent tests show regressions on long-context retrieval and thematic reasoning.</description><pubDate>Fri, 17 Apr 2026 09:30:00 GMT</pubDate><category>ai</category><category>claude</category><category>anthropic</category><category>opus-4-7</category><category>llm</category><category>benchmarks</category><category>mythos</category><category>ai-models</category><author>dieter-morelli</author></item></channel></rss>