#llm

Large language model releases, benchmarks, capability jumps, and the infrastructure that runs them.

A padlock chained to a smartphone displaying a lock icon, illustrating data privacy.

OpenAI's Privacy Filter is a 1.5B PII redactor that ships under Apache 2.0. Here's what it actually does.

OpenAI released Privacy Filter on April 22 as an open-weight on-device model for masking eight types of PII. F1 of 96%. Runs in a browser. Here's the catch.

Illustration of an AI-driven chip design process from IEEE Spectrum's coverage.

AI·2 months ago

An AI agent built a working RISC-V CPU from a 219-word prompt in 12 hours. Here's what it actually did.

Verkor's Design Conductor agent went from a 219-word spec to a tape-out-ready RISC-V core called VerCore in 12 hours. The catch: it's still a Celeron.

Aikido Security illustration of the GPT-Proxy backdoor.

Security·2 months ago

Malicious npm and PyPI packages turn dev servers into Chinese LLM proxies

Aikido found a stage-2 Go binary inside two health-check-themed packages that runs an OpenAI-compatible router routing Claude, GPT, and Gemini traffic through Chinese aggregators.

DeepSeek social card from the V4 API documentation release post.

AI·2 months ago

DeepSeek V4 lands: 1.6T-param open MoE, 1M-token context, and SWE-bench within 0.2 of Opus 4.6

DeepSeek shipped V4-Pro and V4-Flash under MIT on April 24. V4-Pro hits 80.6% on SWE-bench Verified. V4-Flash is $0.14 in / $0.28 out.

OpenAI's GPT-5.5 model launch with ChatGPT and Codex interfaces

AI·2 months ago

OpenAI shipped GPT-5.5 seven weeks after 5.4. API tokens now cost twice as much.

OpenAI released GPT-5.5 (codename Spud) on April 23. The API runs at $5/$30 per million tokens, double GPT-5.4, with Pro at $30/$180.

Cloudflare Unweight tensor compression announcement social graphic

Open Source·2 months ago

Cloudflare open-sourced a lossless LLM compressor that shaves 22% off model weights

Unweight is Cloudflare Research's new BF16 weight compressor. 22% smaller bundles, 13% smaller inference footprint, 30-40% throughput overhead, BSD license.

Header card from Simon Willison's 'Qwen3.6 beats Opus' post comparing pelican SVGs

AI·2 months ago

Qwen 3.6-35B-A3B: the open MoE beating Opus 4.7 on Simon Willison's laptop

Alibaba's Qwen 3.6-35B-A3B is a 35B-param mixture-of-experts with only 3B active. Apache 2.0, runs on consumer GPUs, and it's already winning real tasks.

Claude Opus 4.7 launch artwork from the Anthropic news post

AI·2 months ago

Claude Opus 4.7 is here, and the long-context benchmarks got worse

Anthropic's Opus 4.7 is state-of-the-art on SWE-bench and CursorBench, but independent tests show regressions on long-context retrieval and thematic reasoning.