Deep dive · 2026-06-07 · Can AI coding tools survive honest pricing — and was the flat rate ever real?
On June 1, 2026, two documents went out that look unrelated and are not. GitHub flipped 4.7 million paid Copilot seats from flat-ish “premium requests” to token-metered AI Credits, billed at published API rates down to the cached token. The same day, Anthropic confidentially filed a draft S-1 with the SEC. One company turned on the meter; another asked the public markets to grade its margins. These are the same event. The week the AI industry started preparing prospectuses is the week it stopped being able to lie about what your coding agent costs — and the meter, wherever it appeared this month, is best read not as a pricing model but as a confession.
The confession is this: flat-rate AI coding was never a price. It was venture-subsidized customer acquisition wearing a price’s clothes, and everyone selling it knew. The interesting questions — the ones this piece tries to answer — are why the confession came now, whether metering is the destination or a two-year interregnum, and what a working engineer should actually do about it. My thesis, stated up front: honest pricing is survivable, but metering is not the end state. The end state is vertical integration — vendors metering the frontier models they don’t own while steering everything else to cheap models they do. The subsidy isn’t ending; it’s being internalized. As noted in this week’s issue, the supply-side response arrived in the same news cycle as the meter — that’s not a coincidence either.
A short history of lying about the price
The founding datapoint of this whole story is from October 2023: the Wall Street Journal reported that Microsoft was losing an average of $20+ per user per month on Copilot’s $10 subscription, with some users costing up to $80. That’s the original sin, committed when “AI coding” meant autocomplete. Everything since has been the industry trying to crawl back from a price point set when the product consumed a thousandth of the tokens it consumes now — GitHub’s own announcement says the quiet part plainly: under flat pricing, “a quick chat question and a multi-hour autonomous coding session can cost the user the same amount.”
The crawl-back has a recognizable choreography, and we’ve now seen it performed four times. Cursor, June 2025: Pro silently changed from 500 fast requests plus unlimited slow to “$20 of included frontier usage” at API rates; users ran dry in a handful of prompts; three weeks later came the formal apology — “we didn’t handle this pricing rollout well, and we’re sorry” — with refunds. And then, crucially: the new model stayed. Replit, June 2025: “effort-based pricing” replaced flat per-checkpoint billing, the company explained that agent runs would otherwise cost “upwards of $10 for 20 minutes of autonomous work,” a billing bug forced refunds, the backlash re-erupted with Agent 3 — and the model stayed. Anthropic, August 2025: weekly rate limits on Claude Code’s heaviest subscribers, framed as touching fewer than 5% of users; the limits stayed, and the June 15, 2026 sequel moves all headless, CI, and SDK usage into separate credit pools metered at full API list prices (the per-tier amounts — $20/$100/$200 — are widely reported but I haven’t found Anthropic’s own doc, so hold them loosely). GitHub, June 2026: AI Credits, a 400-comment, ~900-downvote community thread, viral screenshots of $29 bills projecting to $750 — and, as of this writing, no walkback.
Backlash → apology (optional) → refunds (optional) → keep the new model anyway. Four vendors, two years, one choreography. When every player in a market performs the same retreat from the same pricing structure despite the same backlash, the structure was the problem, not the communication. The flat rate is dead because it was never alive.
The two curves
The honest version of the economics turns on two curves moving violently in opposite directions, and which one you stare at determines your whole worldview.
Curve one is the deflation. Epoch AI’s analysis of inference prices at constant capability found declines between 9x and 900x per year depending on the benchmark, with a median of 50x — and restricted to post-2024 data, a median of 200x per year. GPT-4-class tokens that cost $20–30 per million in early 2023 cost around $0.40 in 2026. This is the fastest price collapse of any input in the history of computing, including transistors.
Curve two is the consumption. A Microsoft Research and Stanford paper from April measured what agents actually burn: roughly 1,000x more tokens than a chat exchange, one to three and a half million tokens for a single SWE-bench-style task, and — the detail that should bother you most — up to 30x variance in cost for the same task across runs, with models systematically unable to predict their own spend. Per-developer token consumption grew ~18.6x in nine months. Enterprise AI budgets went from $1.2M to $7M on average in two years. Tokens got 98% cheaper and bills went up 320%. Uber reportedly exhausted its entire 2026 AI coding budget by April. One company allegedly hit a $500M Claude bill in a single month (single-sourced; treat as folklore with a true moral).
This is Jevons’ paradox running at software speed: every 10x cost drop has so far induced more than 10x more ambition. In 2023 the unit of consumption was a completion. In 2024, a chat. In 2025, an agent session. In 2026 it’s a fleet of agents running overnight, and the 30x same-task variance means a meaningful slice of that spend is waste nobody can see — which is precisely why a standards body (the Tokenomics Foundation, launching July under the Linux Foundation) now exists to define what a token bill even means. The deflation curve is real, but betting your budget on it is betting that engineers will stop finding new things for agents to do. Three years of evidence says they won’t.
Why now: the prospectus theory of pricing
If the unit economics have been broken since 2023, why did the meter arrive everywhere in the same week of June 2026? Because the audience changed.
Anthropic’s margin history, per reporting in The Information, runs from −94% gross margin in 2024 to a projected ~40% in 2025 — trimmed from 50% because cloud inference ran 23% over plan — with an internal target of 77% by 2028. You do not get from there to a credible October listing while your heaviest users consume API-list-price compute at a 90% discount. Cursor reached a reported $2B ARR while running negative gross margins until its in-house Composer model and cheap third-party routing recently nudged it positive — and it still reportedly loses money on individual developers. Cline’s founder put the whole dynamic in one sentence: “as soon as openai goes public i won’t be surprised if the codex reset button breaks. the era of free tokens is ending.”
But the prospectus theory only explains the metering. The more interesting move happened one day later, and it explains the destination. On June 2, Microsoft shipped MAI-Code-1-Flash — its first in-house coding model, a 5B-class system scoring 51.2% on SWE-Bench Pro (against Claude Haiku’s 35.2%) while using up to 60% fewer tokens — and rolled it directly into Copilot’s default auto-picker. Read the two announcements as one strategy: meter Claude Opus at 27x, remove the free fallback, and quietly make the default a Microsoft model running on Microsoft silicon in Microsoft datacenters. Cursor is running the identical play — its new Teams pricing literally maintains separate usage pools for its own models versus third-party ones, and its blog states the thesis outright: “model choice is the most important determinant of costs.”
This is the actual shape of the future, and it rhymes with every infrastructure business before it. The meter is not how these companies want to make money; the meter is how they make frontier models they don’t own unattractive, while the margin moves into vertical integration — own model, own inference stack, own default. The flat rate will come back. It will come back as “unlimited, on our models,” a loss-leader that costs the vendor almost nothing and locks you into their stack. The subsidy isn’t dying. It’s being internalized, and renamed “auto.”
The steelman: metering as a two-year panic
The strongest case against everything above deserves its full weight. It goes like this: the June backlash is a tail story — Futuresearch’s modeling puts the median Copilot seat at $19/month, unchanged, with the viral $750 screenshots coming from a handful of extreme agentic users. Meanwhile the price floor keeps collapsing: post-2024 capability-adjusted prices fall 200x a year, open-weight DeepSeek V4 prices output tokens at $0.28–0.87 per million as a ceiling under every metered vendor, and 128GB unified-memory boxes — soon every RTX Spark-class corporate laptop — run credible coding models at zero marginal cost. On this view, June 2026 was about one quarter’s optics (an S-1, an earnings call, a $50B raise), and once distilled models absorb the bulk of agentic traffic, flat rates return industry-wide — exactly as unlimited data returned to telecom after the metered-data panic of the early 2010s. Five-billion-parameter models beating last year’s mid-tier frontier is the proof the cavalry is already here.
It’s a good argument, and half of it is right — the half about where commodity traffic goes. Here’s what it misses. First, Jevons again: the 320%-bills-despite-98%-price-cuts paradox is three years old and accelerating; consumption ambition has outrun deflation every single year, and the same Microsoft Research data shows token spend doesn’t even buy reliability (30x variance). Second, the telecom analogy actually supports the vertical-integration thesis, not the restoration thesis: “unlimited” came back to telecom after carriers owned enough capacity that the marginal user was nearly free — which is precisely the own-model, own-silicon position Microsoft and Cursor are building toward, not a return to subsidizing someone else’s API. And third, the tail being small doesn’t make the tail unimportant — the extreme agentic users generating $750 bills are not anomalies, they’re the future median, because the entire industry’s roadmap (including the vendors’ own marketing) is to make every developer work the way the tail works today. Metering the tail while it’s small is how you avoid confessing again in 2028.
So: metering survives, but as a boundary, not a business. The meter marks the line between compute the vendor owns (bundled, “free,” default) and compute it rents (metered, premium, increasingly niche). The pricing war of 2027 won’t be fought over rates per token; it’ll be fought over where that line sits and how good the house model behind it really is.
What to do about it
For working engineers and the people who budget for them, the practical reading:
- Get a token budget before your CFO assigns you one. A Futuresearch survey-based forecast has 52% of the Fortune 500 implementing AI usage caps by end of 2026; Google Cloud shipped Spend Caps this month. The era of unmeasured agent usage is over either way; the only question is whether engineering or finance draws the lines.
- Measure cost per merged PR, not cost per token. The 30x same-task variance means raw token dashboards mislead. The number that matters is what a unit of shipped work costs across your routing mix — it’s also the number that justifies the spend when the cap conversation comes.
- Build the routing reflex now. Cheap/local/house model by default, frontier model on explicit escalation. The vendors are building this into their auto-pickers for their benefit; replicate it for yours. The skill of knowing which 20% of tasks genuinely need a frontier model is about to be a senior-engineer skill.
- Treat AI API keys as currency, not config. Metered tokens are spendable money, and IronWorm specifically harvesting OpenAI and Anthropic keys this month was the criminal market reaching the same conclusion. Scope keys, cap them, rotate them on a schedule.
- If you run Claude Code in CI (as this publication does), re-estimate against the June 15 credit pools this week — headless usage at API list prices is a different budget than the subscription you signed in January.
What would change my mind: a major vendor restoring genuinely unlimited frontier usage at a flat rate and holding it for two quarters (restoration thesis wins); house-model quality stalling so that frontier routing stays above ~50% of agentic traffic into 2027 (metering becomes the durable business, not the boundary); or token deflation finally outrunning ambition — a full year of flat per-developer consumption — which would mean Jevons has run out, and with it most of this essay. Watch the auto-picker traffic shares; that’s where the truth will show up first.