← Back to Articles

On Tokenmaxxing, Tokenomics, and Quality

On Tokenmaxxing, Tokenomics, and Quality

The Information reported earlier this month that a Meta employee had built an internal leaderboard, on their own initiative, that ranks the company’s staff by how many AI tokens they personally consume. It is called Claudeonomics. The leaderboard is the public face of what Silicon Valley has decided to call tokenmaxxing (gag), the practice of using as many tokens as possible on the premise that doing so constitutes productivity. Some employees have reportedly been leaving AI agents running idle for hours to inflate their numbers. The top-ranked individual averaged 281 billion tokens in a thirty-day window, which on public Claude pricing would cost something north of a million bucks for one person’s leaderboard ambition. Meta took the dashboard down two days after the story broke, though, tellingly, the plan to fold AI usage metrics into performance reviews next year has not been withdrawn.

This isn’t a simple one of Meta’s many, many curiosities. Nvidia’s Jensen Huang told an audience at GTC that he would be “deeply alarmed” if an engineer earning $500,000 a year was not consuming at least $250,000 in tokens, which is the kind of public statement that exists only when a great deal of money has made it sound reasonable. Meta’s CTO Andrew Bosworth said at a tech conference back in February that his best engineer was spending the equivalent of his salary on tokens and was “5x to 10x more productive” as a result. That claim only stands up if the productivity metric in question is the token count itself. Of course, this framing has a commercial benefit on the supply side. Model providers benefit when their customers read consumption growth as productivity growth, because revenue depends on token volume going up. A customer who celebrates rising AI spend as evidence of progress renews next year with a bigger contract, which is exactly the outcome the business model is optimised to produce.

Meanwhile, in the same week’s reporting, Uber’s CTO told The Information that the Claude Code budget he thought he would need had already been blown away by internal usage. This is Uber, a company whose finance function has spent a decade building the discipline of metering compute against revenue at scale. If they were caught out by how much AI their engineers were consuming, every other large enterprise is in the same position. They just haven’t been asked the question yet. The pricing model sitting underneath all of this has also changed quite a bit in the last few weeks. Anthropic moved Claude Enterprise customers off flat per-seat licensing onto usage-based billing, and restricted Claude Code subscriptions from being used with third-party tools, pushing that traffic onto pay-as-you-go. Licensing analysts estimate that this could double or triple costs for heavy users. The productivity software pricing model that made AI feel like a line item is being replaced by something closer to cloud compute billing. Tokens are the new instance hours.

That analogy is worth digging into a bit, because the cloud transition offers a roadmap for what happens next and the roadmap is not flattering. BAck in the day, cloud bills started arriving larger than anyone had modelled. Departments found themselves paying for resources nobody could clearly attribute. The response was FinOps, a whole new function whose job was to make cloud spend legible, attributable, and manageable. The discipline took years to mature into something like its current form, and during that maturation the bills kept getting worse. AI is running the same playbook faster and from a worse starting position. Tokens are harder to attribute to specific business units than cloud instances ever were, because agents produce work that crosses team boundaries mid-sentence.

Enterprises are responding to the token bills in the only way they know how, which is to build dashboards. Reports are going up that show tokens per engineer, dollars per sprint, model calls by business unit, cost per ticket closed. Claudeonomics, essentially, with a finance team behind it instead of a gamified intranet page. These numbers are financial measurement doing cosplay as productivity measurement. A token count tells you exactly what you spent. It tells you nothing about whether the work needed that volume, whether the output was fit for purpose, or whether you could have done the same job for a fraction of the cost. Those are three separate questions and the dashboards answer none of them.

Quality is also not a single question, which is part of why nobody has instrumented it properly. Some work benefits from vast volume at average quality. Triaging customer messages, tagging unstructured data, generating first drafts of routine documents, the tasks where throughput is genuinely the point. Other work needs low volume at high precision. Strategic analysis, system architecture decisions, legal drafting, the tasks where one bad output costs more than ten good ones saved. A metric that reports tokens consumed treats both kinds of work identically. A leaderboard that ranks engineers by consumption will reward the wrong behaviour for whichever category is the bulk of what the team is actually being paid to do. The Meta employee running agents idle for hours to climb the Claudeonomics is actually just optimising for the metric that was chosen.

There is an ownership mismatch here that is going to shape the next eighteen months. The cost of AI has a clear home. It sits on the CFO’s dashboard, it appears in the budget variance report, and it has a named owner who is accountable when it runs over. The quality of AI output, largely, does not have a home. There is no chief AI quality officer. There is no standard metric that goes on a board deck. When the cost has an owner and the quality does not, the organisational weight all falls on one side. The cost gets managed and the quality becomes assumed, and then eventually discovered to have been assumed. That’s the work that actually needs doing, and nobody is currently doing it. Every organisation deploying AI at scale needs someone with the authority of a named executive, a budget, and a seat in the room when the spend gets approved, whose job is to own the question of whether the output is any good. That means defining what good looks like for each type of work, and building the instrumentation that would tell the organisation whether the bar is being met. And it means making sure that measurement sits on the same dashboard as the spend, so that no token accounting conversation happens without the quality question running alongside it.