DeepSeek locked in the 75% V4-Pro cut. The API now undercuts every Western frontier model.

On May 23 DeepSeek told customers the V4-Pro discount becomes its standard price after May 31. Output drops from $3.48 to $0.87 per million tokens.

DeepSeek told customers on May 23 that the 75% discount it had been running on V4-Pro is becoming the permanent API price. The cut takes effect at the next billing cycle, and it puts the Chinese lab’s flagship model below every frontier API in the West on per-token cost.

This is the second pricing reset DeepSeek has run in 30 days, and the first time since the original V4 launch in April that the company has touched list pricing. The shift lands in the middle of the rate-limit anger that has built up around Anthropic and OpenAI through the spring, and it gives any team currently paying frontier prices a number to bring to a renegotiation meeting. Compliance teams will still flag the PRC hosting, but Western inference providers serve open V4 weights and will reprice against the new card within weeks.

What the new pricing actually says

The official pricing page carries the announcement in one sentence. “The deepseek-v4-pro model API pricing will be officially adjusted to 1/4 of the original price after the 75% discount promotion ends on 2026/05/31 15:59 UTC,” the doc reads. Translation: the promo number is now the list number.

That brings cache-miss input from $1.74 down to $0.435 per million tokens, and output from $3.48 down to $0.87. Cache hits drop alongside, to $0.003625 per million. DeepSeek pushed a separate change last month that cut cache-hit pricing across the whole API to a tenth of launch, effective April 26. The two changes stack.

Reuters reported that the per-million range, in renminbi, drops from a 0.1 to 24 yuan band to 0.025 to 6 yuan. DeepSeek didn’t issue a press release. The change shows up in the docs and in customer notifications, nowhere else.

What it does to the frontier-model price chart

V4-Pro now sits at $0.435 input and $0.87 output. Claude Opus 4.7 is $5 input, $25 output. GPT-5.4 is roughly $2.50 input and $10 output. Gemini is $1.25 input and $10 output for the long-context tier. On output alone, V4-Pro is somewhere between an eighth and a thirtieth of those numbers.

That doesn’t mean V4-Pro is equivalent on quality. The April launch piece on DeepSeek V4 covered the SWE-bench gap, which sits within 0.2 of Opus 4.6 on the verified split. V4-Pro is competitive on coding and long-context, weaker on multimodal. Where it lines up with Western frontier models, the price gap is the story.

When DeepSeek launched V4 last month, Reuters noted the company blamed “constraints in high-end compute capacity” for the original Pro markup, and signaled prices would fall once Huawei Ascend 950 supernodes ship in volume in the second half of the year. The May 23 cut arrived before that timeline. DeepSeek won’t say whether Ascend supply broke through early or whether this is a margin trade against developer share.

What we don’t know

Whether the new pricing comes with rate-limit changes. Customers on the existing free-tier path have hit caps the past month; the Android Headlines analysis on May 23 reads the cut as a play against Anthropic’s tightened limits, but DeepSeek hasn’t published a new tokens-per-minute table.
Whether V4-Flash, the cheaper sibling, gets a parallel cut. Last month’s launch put Pro at up to 12x Flash; the new gap is closer to 3x. Flash users get less out of this announcement than the headline suggests.
Whether the SEC-watching question is real. Three publicly traded US labs price against DeepSeek’s published rates in their own deck math. A permanent quarter-price tier puts pressure on the per-token margin every one of them quotes investors.

What this means for you

If you’re shipping a product that bills per token, the cost math just shifted. A workload that ran $4,000 a month on Opus 4.7 lands at $700 to $1,200 on V4-Pro for comparable input/output ratios. That’s enough to justify a side-by-side evaluation on your actual prompts, not the leaderboards. Pull two days of production traffic, replay it through both APIs, score on your own rubric.

The harder question is data residency. V4-Pro is hosted in the PRC. Anything you send to the public endpoint is subject to Chinese data law, and a chunk of the enterprise market won’t ship a single token over that wire. The open weights help: DeepSeek V4 is downloadable, runnable on Together AI, Fireworks, and a half-dozen other Western-hosted inference providers, all of which will reprice against the new card in the next two weeks. Watch those listings, not the public API, if compliance is the blocker.