Thursday, July 2, 2026
HomeTechnologyThe Finish of Tokenmaxxing – O’Reilly

The Finish of Tokenmaxxing – O’Reilly

The apply of tokenmaxxing seems to be dying out, even earlier than I had an opportunity to put in writing about it. Good riddance. Burning tokens to create the looks of productiveness was fated to final solely till the accountants discovered about it, and the strictest of all accountants is one’s private checkbook. What bought many builders serious about the price of AI was the change in GitHub Copilot’s utilization expenses. The price of Copilot went from a month-to-month payment with limitless use to a month-to-month payment that bought a restricted variety of credit, that are used to pay the AI supplier of your alternative. One credit score is equal to US$0.01; once you’ve used up your credit, you’ll be able to improve your account or pay for extra credit as you go.

The query isn’t why this didn’t occur earlier; it’s why this occurred now. Tokenmaxxing is each the creation and sufferer of two large-scale traits in AI. First, beginning with OpenAI, the most important AI suppliers had been all enjoying a blitzscaling recreation that prioritized person development over profitability. Giving AI companies away totally free bought you extra customers, and in the long term, scalers would determine find out how to make cash from end-user charges, promoting person information, or promoting. This course of inevitably ends in enshittification, and remains to be very a lot the highway we’re on.

Second, token utilization exploded late in 2025. The looks of “reasoning fashions,” which use tokens to keep up an inside dialog in the midst of fixing an issue, elevated the variety of tokens used to answer every immediate. Reasoning tokens are a mannequin’s dialog with itself about attainable responses to the immediate, and are sometimes extra quite a few than the immediate and response themselves. Whether or not or not customers see the reasoning course of (usually they don’t), reasoning tokens add to the invoice. They’re steadily counted as “output tokens” as a result of they’re generated by the mannequin, and are dearer than enter tokens.

The looks of brokers additionally multiplied the speed at which customers consumed tokens. In Could, 2025, Simon Willison quoted Anthropic’s Hannah Moran’s definition of an agent: “Brokers are fashions utilizing instruments in a loop.” The Tredence weblog writes: “The agent loop is a repeating cycle by which the AI reads the present information, thinks via what it means, chooses an motion, carries it out, checks what occurs and begins over.” When you’ve ever watched Claude Code, OpenClaw, or every other agent work, a single request can turn into many calls to a mannequin, every one utilizing lots of of tokens, if not 1000’s. Along with the present request, one agent-generated invocation can include the duty’s whole accrued context and related paperwork. Between reasoning tokens and brokers, token utilization goes up by an element of lots of.

The rise in token utilization won’t be a difficulty if it leads to issues being solved and duties accomplished extra successfully. But it surely collides with the loss-leader pricing of the blitzscalers; their willingness to function at a loss to realize management of a market has limits. No matter whether or not the variety of AI customers is growing, the quantity of computation, and subsequently price, per person grows as using brokers will increase. Reasoning fashions elevated token utilization; brokers compounded the issue; and that led to cost will increase.1 Microsoft/GitHub doesn’t need to pay Copilot clients’ AI payments. We haven’t but seen across-the-board value will increase from the AI suppliers themselves. However we have now seen GitHub’s token credit, and we have now seen Anthropic and OpenAI value extra succesful fashions considerably greater than older or much less succesful fashions. Fable is twice as costly as Opus 4.8, and whereas some writers have known as this pricing “incredible,” that’s in all probability as a result of they had been anticipating a good better improve. Whereas Fable can delegate duties to Anthropic’s inexpensive fashions, most early customers observe that with Fable, token use goes up reasonably than down. Anthropic’s change to token-based billing for its agent SDK (at present on maintain) is one other sign that the times of cheap AI are coming to an finish. OpenAI’s story is comparable: GPT 5.5 prices twice as a lot GPT 5.4 per million tokens.

It’s additionally necessary to take capability under consideration. Large information facilities have been within the information, however these information facilities haven’t been constructed but. Extra necessary, {the electrical} infrastructure wanted to assist these information facilities—transmission traces, mills—hasn’t been constructed both, and that’s not an funding over which AI firms have a lot management. They will construct their very own energy technology amenities on a knowledge heart campus, however that’s an enormous funding in applied sciences that they’re not acquainted with. And even should you generate energy domestically, you want other forms of infrastructure: rail for coal, pipelines for gasoline. This isn’t (but) an essay about information heart energy consumption and its penalties, however it’s one other issue that limits elevated token utilization. We’ve seen Anthropic’s outages blamed on capability, and Anthropic has responded by leasing unused information heart capability from SpaceX. However the different manner to answer elevated demand that may’t be met by present capability is to extend costs, limiting clients to those that can afford to pay. That improve is being observed by managers, accountants, and impartial builders.

Token optimization and accountability are the inevitable consequence of upward stress on token value. One technique to construct accountability is thru higher governance, a route Bennie Haelen describes in “The Subsidy Ended: What Instrument-Utilizing Brokers Truly Value.” Higher governance is achieved via constructing an observability layer that permits you to see precisely what the brokers and fashions are doing. With a well-designed observability layer, you’ll be able to see whether or not the information despatched to the mannequin is rising with every invocation, whether or not the mannequin is utilizing acceptable instruments, whether or not instruments are being known as repeatedly, and quite a lot of different data that may let you know whether or not your agent is operating effectively.

One other piece of token accountability is knowing which fashions are operating your agent’s requests. Common-purpose reasoning fashions vary from costly high-performance fashions like Claude Fable or Opus 4.8 to fashions like Gemma 4 26B that may run on a well-equipped laptop computer, and a few fashions which might be even smaller. Whereas it’s tempting to say “I would like the most effective; I’ll run Opus 4.8 or Fable with most reasoning,” most requests don’t require that stage of reasoning or expense. Brokers will be capable to resolve what mannequin is finest for processing each request. Fable can delegate, and we count on different frontier suppliers to comply with as fashions incorporate agent capabilities. And there’s an lively world of open fashions exterior of the frontier AI suppliers. Vicki Boykis writes that fashions operating domestically now work virtually in addition to frontier fashions. Instruments like OpenRouter offer you a model-independent manner of routing requests to totally different fashions, together with open fashions that run domestically. OpenRouter could be built-in with OpenClaw, Claude Code, Cursor, Codex, and different brokers to supply clever routing.

Tokenmaxxing is dying. It’s going to little question take time for its vestiges to die away, and there’ll at all times be builders who suppose they’ll recreation the trail to a promotion, together with managers who insist on being “all in” with AI. However spending tokens responsibly is now the norm, whether or not you pay with your personal checkbook or an organization account. Token optimization will solely turn into extra necessary as per-token expenses improve. They undoubtedly will.

Footnotes

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments