Uber blew its entire 2026 AI coding budget in four months. Its COO can't prove it paid off.

Uber exhausted its full-year Claude Code budget by April. Adoption hit 84%, heavy users burn $2,000 a month, and COO Andrew Macdonald can't connect the spend to shipped features.

Uber spent its entire 2026 AI coding budget by April, eight months early. The company’s own COO now says he can’t draw a straight line from that spend to anything riders or drivers actually got, which is an awkward thing to admit after the money is gone.

That admission came from Andrew Macdonald, Uber’s president and chief operating officer, on the Rapid Response podcast. Uber had encouraged its engineers to adopt AI coding tools aggressively, Claude Code adoption jumped from about a third to 84% in a single month and reached 95% monthly use by the first quarter. The tools worked. Engineers loved them. And the bill arrived faster than anyone modeled, eating a full year of allocation in four. Now the question Macdonald is asking out loud is the one every CFO is about to ask: what did we get for it?

The math that broke the budget

AI coding agents bill by the token, and tokens add up in ways a per-seat software license never did. A normal SaaS tool costs the same whether an engineer opens it once a week or lives in it. An agent like Claude Code costs more every time it reads a file, retries a step, or chews through a large codebase. Usage and cost move together with no ceiling.

The per-engineer spread Uber saw makes the problem concrete. Typical users ran $150 to $250 a month. Heavy users hit $500 to $2,000. And in the single most quotable data point of the whole story, CTO Praveen Neppalli Naga reportedly burned through $1,200 in token costs during a two-hour demo. That’s one executive, one afternoon, the price of a high-end laptop.

Multiply the heavy-user tier across a large engineering org and the four-month burn stops looking like a mistake and starts looking like arithmetic. Uber’s R&D spending was already climbing: $951 million in Q1 2026 alone, up 17% year over year. The AI line is a growing slice of that, and unlike servers or salaries, it scales with enthusiasm.

Why adoption exploded so fast

Uber didn’t stumble into 84% adoption. It engineered it, partly through internal incentives that ranked teams by how much they used the tools. Make usage a leaderboard and usage goes up. That’s not a surprise, it’s the whole point of a leaderboard.

The catch is that a usage leaderboard measures the wrong thing. It rewards token consumption, not shipped value. An engineer who runs the agent ten times to explore an idea scores higher than one who writes the fix by hand in five minutes. Disney ran into the same incentive trap when one employee called Claude 460,000 times in nine days chasing a dashboard ranking. When you gamify input, you get input. You don’t necessarily get output.

So Uber now has the adoption number every vendor wants in a case study, and a COO who can’t tell you whether the work got better. Those two facts sitting next to each other is the actual story.

The ROI question nobody can answer yet

Here’s Macdonald in his own words: “That link is not there yet, right? I think maybe implicitly there is more that is getting shipped, but it’s very hard to draw a line between one of those stats and, ‘Okay, now we’re actually producing 25 percent more useful consumer features.’”

Read that twice, because it’s unusually honest for a sitting executive. He isn’t saying AI coding doesn’t work. He’s saying he can’t measure that it does, and without measurement the spend “becomes harder to justify.” Engineers report feeling faster. Pull requests may be flowing. But “feels faster” and “shipped 25% more features riders care about” are different claims, and only one of them shows up in a board deck.

This is the gap the entire industry is papering over. Vendor benchmarks measure tasks completed in a sandbox. Developer surveys measure sentiment. Neither measures the thing a business pays for, which is more value reaching customers per dollar. Until someone connects token spend to product outcomes, every AI coding budget is an act of faith dressed up as a metric.

There’s a subtler trap underneath. Even if the agents do make engineers faster, faster at writing code isn’t the same as faster at shipping product. Code is rarely the bottleneck at a company Uber’s size. Review queues, on-call load, compliance gates, and coordination across teams are. Pour more generated code into a pipeline that’s already congested downstream and you get more code waiting in line, not more features out the door. The token meter runs the whole time. That’s how you can have engineers who genuinely feel more productive, a budget that’s genuinely empty, and a product velocity chart that’s genuinely flat, all at once. None of those three facts contradicts the others, and that’s exactly why the ROI question is so hard to close.

Why you’re hearing about this now

The timing tracks a broader correction. A week ago, Microsoft told its own engineers to drop Claude Code and move to Copilot CLI, a cost-and-control decision as much as a product one. Anthropic, for its part, has been scrambling on the supply side, doubling Claude Code’s limits by renting enormous GPU capacity. The demand is real and the unit economics are ugly on both ends: expensive for Anthropic to serve, expensive for enterprises to consume.

Uber going public with a blown budget is the first big customer saying the quiet part out loud. It won’t be the last. The 2025 story was “adopt AI coding or fall behind.” The 2026 story is turning into “adopt it, then figure out if you can afford it.”

What makes Uber’s case land is that it isn’t a skeptic talking. This is a company that pushed adoption hard, hit the numbers vendors brag about, and still ended up unsure. If the believers can’t show the return, the holdouts have their excuse, and every procurement team renewing a contract this summer just got a new question to ask.

What this means for you

If you run an engineering org, kill the usage leaderboard before your next budget cycle. It manufactures the exact spend Uber is now regretting and rewards the wrong behavior. Replace it with a cap plus an outcome metric: give teams a token budget they can see in real time, and measure them on features shipped or incidents reduced, not tokens consumed. Make the cost visible to the person spending it and the $1,200 demo doesn’t happen twice.

If you’re an engineer, the practical takeaway is to treat the agent like a contractor you’re paying by the hour, because you effectively are. Reach for it on the gnarly refactor, the unfamiliar codebase, the test scaffolding it does genuinely well. Don’t burn a thousand tokens regenerating a function you could type in thirty seconds. The skill that’s about to matter isn’t prompting. It’s knowing when not to.