6 دقيقة قراءة

Anthropic's New Billing Split Reveals What AI Agents Actually Cost

For the past year, a strange thing has been happening to enterprise AI budgets. Companies are spending dramatically more on AI, even as the per-unit cost of AI keeps dropping. Average enterprise AI spending jumped from $1.2 million in 2024 to $7 million in 2026, a 483% increase, while per-token API prices fell roughly 280x over the same period. Those two trends should cancel each other out. They do not, and the reason has everything to do with how AI agents consume compute compared to the chatbots that came before them.

Anthropic just made this math visible in a way that is hard to ignore. On May 13, the company announced it would split its subscription billing into two pools starting June 15. One pool covers interactive chat, the kind of back-and-forth conversation most people associate with Claude. The other, a new and separate credit allowance, covers Agent SDK usage, including automated workflows, third-party coding tools, and anything that runs Claude without a human typing prompts in real time.

The Max 20x plan, priced at $200 per month, now comes with $200 in Agent SDK credits. That sounds reasonable until you look at what those same users were consuming before the split. A community analysis of the subsidy math found that the heaviest Sonnet users were extracting up to $35,000 per month in API-equivalent value from a $200 subscription. The ratio between what they paid and what they used was 175 to 1.

Boris Cherny, Anthropic's Head of Claude Code, was candid about it: "Our subscriptions weren't built for the usage patterns of these third-party tools." Which is a polite way of saying the flat-rate model was never meant to subsidize production-scale agent workloads, and it could not continue doing so.

How Anthropic got here

The timeline tells the story of a company realizing in stages just how unsustainable the economics had become. In April, Anthropic briefly removed Claude Code from the $20 Pro plan entirely. The developer backlash was swift and loud. Theo Browne, creator of T3 Code, calculated that his community's effective cost jumped 25x overnight and called the move an attack on the open-source tooling ecosystem. Anthropic reversed the decision within 24 hours.

The credit-pool split announced in May was the more measured correction. Rather than cutting off agent access, Anthropic capped it at a dollar amount that reflects actual API pricing. It is a reasonable compromise, but it also surfaces a truth the industry has been dancing around: flat-rate access to agent-grade AI was a temporary market condition, not a sustainable pricing model.

One documented user consumed 10 billion tokens over eight months on a $100 per month plan. At API rates, that would have cost roughly $15,000. They paid $800. That kind of gap does not survive contact with a profit-and-loss statement for very long.

Why agents eat tokens the way they do

The pricing disconnect is not really about Anthropic's business model. It is about a fundamental difference in how agents use compute compared to chatbots, and most organizations have not internalized that difference yet.

A typical chatbot conversation involves a prompt, a response, and maybe a couple of follow-ups. A few thousand tokens, in and out. An agent working through a multi-step task operates differently. It calls a tool. It reads the result. It decides what to do next. It calls another tool. Each of those steps requires the model to re-process the entire conversation history before generating its next action. According to research from the Stanford Digital Economy Lab, re-sent context accounts for 62% of total agent inference bills. Most of what you are paying for is the model re-reading what it already knows.

Gartner's 2026 analysis puts the multiplier at 5 to 30x: agentic workloads consume that much more compute than standard chatbot interactions for equivalent business outcomes. In practice, a chatbot handling a thousand customer queries might use X tokens. An agent autonomously resolving those same thousand cases, with tool calls, retrieval steps, and multi-turn reasoning chains, might consume 15X to 30X.

This is why budgets are exploding even as per-token costs fall. The unit economics got cheaper. The units per task got dramatically more expensive. And most teams did not see it coming because they were still thinking in chatbot-era cost models.

What the invoices actually look like

A LeanOps audit of 30 engineering teams running coding agents paints a concrete picture. The median developer's agent bill was $480 per month. The 90th percentile hit $1,650. One developer managed to burn through $4,200 in a single weekend while running an autonomous refactoring session that looped longer than anyone expected.

The same audit looked at a growth-stage SaaS company with 35 engineers. Their combined agent inference bill was $87,000 per month. After auditing their token usage patterns and implementing smarter model routing, routing simpler subtasks to cheaper models instead of sending everything through the most expensive one, they brought it down to $24,000 per month. A 72% reduction, with no loss in agent capability.

Most of the waste fell into three categories that will look familiar to anyone who has audited a cloud infrastructure bill:

The cost model just flipped

There is a bigger structural shift happening underneath the Anthropic story, and it affects every organization running agents, not just Claude users. In 2023, about 40% of enterprise AI budgets went to inference. Today that number is 85%. Training costs, the line item that used to dominate the conversation, have been eclipsed by the ongoing cost of actually running models in production.

This inverts the traditional software cost model. Software was historically expensive to build and cheap to run. AI agents are increasingly cheap to build and expensive to run. A developer can prototype an agent in an afternoon. Running that agent at enterprise scale for a year costs more than building it ever did.

Anthropic's correction is the most visible signal of this shift so far, but it will not be the last. If the company most associated with developer-friendly AI access cannot sustain flat-rate agent pricing, the assumption that agent infrastructure comes cheap needs revisiting across the board.

Three things to do before June 15 (and after)

Whether you use Claude or not, the economics Anthropic just exposed apply to every model provider. The subsidy was Anthropic-specific. The underlying cost structure is universal.

First, measure what you are actually spending per task. Most teams have no visibility into this. The LeanOps data showed a 3.4x cost spread between median and 90th-percentile developers on the same team doing similar work. You cannot optimize what you have not measured.

Second, route tasks to the right model. Not every agent action needs frontier-grade reasoning. Classification, extraction, formatting, and templating can run on smaller, cheaper models. Reserve the expensive models for planning, judgment calls, and complex multi-step reasoning. This alone typically cuts inference costs 40 to 60%.

Third, set token budgets on autonomous sessions. The $4,200 weekend happened because nobody set a ceiling. A budget of 500K tokens per autonomous run, with a human checkpoint required to extend it, catches runaway loops before they become runaway invoices.

The era of subsidized agent inference is winding down. Anthropic said it directly, and the math supports them. Enterprise AI budgets will keep growing, but the gap between organizations that treat agent costs as an engineering discipline and those that treat them as a subscription line item will widen fast. The first group will spend 60 to 70% less for equivalent output. The second will keep getting surprised by invoices until the next pricing correction forces the conversation.

The question is not whether AI agents are worth the investment. It is whether your infrastructure runs them at a cost that lets the business case actually hold.

ابدأ اليوم

ابدأ في بناء وكلاء الذكاء الاصطناعي لأتمتة العمليات

انضم إلى منصتنا وابدأ في بناء وكلاء الذكاء الاصطناعي لمختلف أنواع الأتمتة.

ابدأ اليوم

ابدأ في بناء وكلاء الذكاء الاصطناعي لأتمتة العمليات

انضم إلى منصتنا وابدأ في بناء وكلاء الذكاء الاصطناعي لمختلف أنواع الأتمتة.