Deep dive · 2026-06-09 · When the model commoditizes, who owns the off-ramp?

Last week’s dive ended on a prediction: once honest pricing makes cheap substitutes real, the metered price of frontier compute becomes a starting gun, not a finish line. The substitutes are now real. This week, two open Chinese coding models shipped in 24 hours, and one of them — Moonshot’s Kimi K2.7-Code, a trillion-parameter mixture-of-experts that lands on Hugging Face today — scores 81.1% on the MCPMark tool-use benchmark against Claude Opus 4.8’s 76.4%, at $0.95 per million input tokens. An open model, free to self-host, beating the flagship on the thing coding agents actually do, for roughly a tenth of the price.

So here is the obvious question, and it is the wrong one: if the model is a commodity, who wins? The right question is the one three different companies answered with their checkbooks this week. They did not spend money to build a better model. They spent it to own the path the model travels to reach you. Google killed a beloved CLI. OpenAI bought a cloud and rented a procurement department. Anthropic hired a thousand people and gave them token budgets. Read together, the week’s moves make one argument, and this piece is going to defend it: when frontier models converge, the moat stops being the model and becomes the channel — distribution, procurement rails, and switching cost. The product was never the CLI or the weights. It was the off-ramp.

This is not an AI story. It is a distribution story that happens to be about AI, and you can only see it by standing in four places at once: the terminal where developers live, the marketing teams fighting for their attention, the procurement systems where enterprise money actually moves, and the policy rooms deciding whose rails are allowed to exist.

Three moves, one playbook

Start with the most brutal one, because the brutality is the evidence.

On May 19, Google announced that Gemini CLI would stop serving free, Pro, and Ultra users on June 18. The tool is about a year old. The replacement is Antigravity CLI — a closed-source binary called agy, written in Go, the terminal surface of the six-layer Antigravity stack Google announced at I/O. The carve-out is the tell: enterprise customers with Gemini Code Assist Standard or Enterprise licenses, and anyone on a paid Agent Platform API key, keep working uninterrupted. Free and Pro individuals get migrated or cut.

Look at who keeps access and who loses it. The model behind both CLIs is the same Gemini. Google is not changing the intelligence. It is changing the channel — retiring the open, model-agnostic-ish harness that individuals adopted, and steering everyone toward a closed multi-agent surface it fully controls, while protecting the one cohort whose access is tied to a signed contract. The CLI was never the product. It was a customer-acquisition funnel, and Google just reorganized the funnel.

The HN thread (406 points, 210 comments) read the move the way burned users always read Google. “You can’t build a workflow around something that gets renamed or killed every 6 months,” wrote jesse_dot_id. Another commenter, ilioscio, put the org dynamics more sharply: “All the engineers get fired but somehow the managers that constantly destroy brand trust just hang around like cockroaches.” The reputational cost is real. Google paid it anyway. You only torch trust like that for something you’ve decided is disposable — and a thing you’ve decided is disposable was never your moat.

Now the subtler move. On June 11, OpenAI made two announcements that look like different stories and are the same story. It agreed to acquire Ona — the company formerly known as Gitpod, a German cloud-dev-environment business that served two million developers — to give Codex agents a persistent place to run inside the customer’s own cloud. As the framing went, “the customer keeps the data, the credentials, and the audit trail.” The day before, OpenAI announced that its frontier models and Codex would be purchasable through Oracle Universal Credits on OCI.

Neither of those is a model announcement. Both are channel announcements. Ona buys OpenAI a surface — a trusted execution environment in regulated companies that won’t let an agent touch production from a vendor’s cloud. Oracle buys OpenAI a rail. Universal Credits are prepaid annual cloud commitments; a bank or a hospital that has already committed spend to Oracle can now draw OpenAI tokens against that existing pool, inside the procurement and governance framework it already trusts. The strategically enormous number here is Oracle’s $638 billion in Remaining Performance Obligations — pre-signed customer money. OpenAI just got a straw into that pool without fighting a single new procurement battle. The blocker in regulated enterprise was never model quality. It was the six-month security review and the vendor onboarding. OpenAI bought its way around both.

Anthropic’s move looks like philanthropy and functions like distribution. Claude Corps is a $150M program placing 1,000 fellows — at $85k each, plus a token budget — into roughly 400 nonprofits for a year, training them and then embedding them to coach the orgs on Claude workflows. Read it as a land-grab into an install base that the metered-pricing era was about to price out. Nonprofits are exactly the cohort that gets cut when flat rates die. Anthropic is paying to keep them — and to make Claude the default AI literacy of a new generation of operators, on Anthropic’s tooling, with Anthropic’s mental models. The token budget is the giveaway in both senses: it’s free compute, and it’s the thing that makes the install permanent. A fellow trained for a year on Claude is a switching cost wearing a hoodie.

Three companies. One playbook. Spend on the channel, not the model.

Why now: the model stopped being the differentiator

The playbook only makes sense if the model genuinely is converging, so let’s check the backdrop rather than assume it.

The capability gap between the proprietary frontier and the open or specialized field has narrowed to under five percent on many core tasks. As of this spring, Anthropic, OpenAI, Google, xAI, Alibaba, and DeepSeek all sit in the top tier of Arena Elo. This week’s Chinese releases turned that abstraction concrete: Kimi K2.7-Code beats Opus 4.8 on a tool-use benchmark while burning ~30% fewer reasoning tokens than its own predecessor, and Xiaomi’s MiMo line is tying near the top of the Artificial Analysis index. You do not have to believe any single benchmark to see the shape: the thing that was supposed to be the moat now has six credible occupants and an open-weight floor under all of them.

And the harness — the agent loop that wraps the model — is commoditizing too. OpenCode, the open-source, MIT-licensed coding agent, hit 8 million monthly users, ~$25M ARR, and 140k+ GitHub stars about a year after launch. Its entire pitch is the opposite of lock-in: separate the orchestration layer from the model so a developer can point it at Claude, GPT, Gemini, a local model via Ollama, or anything else. When both the model and the open harness are commodities, the only scarce thing left is the path between the developer and the compute — who the default routes to, whose budget it bills against, whose environment it runs in.

This is the precise mechanism last week’s pricing dive predicted. That piece argued the meter is a boundary, not a business — the line a vendor draws between compute it owns and compute it rents, with the endgame being “unlimited, on our models.” The missing half was: a model you own only matters if you also own the channel that delivers it. Vertical integration without distribution is just a cheaper model nobody reaches. So the same vendors that internalized the subsidy this spring spent this week buying the off-ramps. The meter made substitutes inevitable; the channel war is what you fight once substitutes exist. As W23 put it, metering was the starting gun. This is the race.

The four off-ramps

“Channel” is vague, so let’s be concrete. There are four off-ramps a model can be locked into, and the week hit all four.

The terminal. The CLI is where individual developers form habits, and habits are distribution. This is the DevRel and dev-marketing front: Claude Code, Codex, Gemini/Antigravity, Cursor, and OpenCode are all fighting to be the muscle memory in your fingers. Google’s calculation is that a closed, batteries-included agy it controls is worth more than an open harness it doesn’t — even at the cost of every “Google kills its products again” headline. The terminal is the cheapest off-ramp to build and the easiest to lose, which is why it churns constantly. It is also the one developers feel, which is why it generates the most heat per dollar.

The environment. Where the agent actually executes. This is what Ona buys OpenAI: a persistent, secure runtime inside the customer’s own cloud, so the agent can run for hours on the customer’s data without that data leaving. Co-founder Johannes Landgraf’s line — “agents need more than intelligence; they need a trusted workspace” — is a distribution thesis dressed as a product one. The environment is a deeper off-ramp than the terminal because it accretes state: your repo history, your credentials, your audit trail. Once your agents live there, leaving means re-homing all of it.

The rail. Where the money moves. Oracle Universal Credits, AWS Marketplace, Azure consumption commitments, GCP. This is the off-ramp finance controls, and it’s the stickiest of all, because it operates below the level where developers or even engineering leaders get a vote. If OpenAI tokens draw down a commitment your CFO signed two years ago, the model-selection decision was made in procurement before any engineer benchmarked anything. The rail is invisible from the terminal and decisive from the boardroom.

The install base. The humans. Claude Corps is a bet that the cheapest durable distribution is a person who learned the trade on your tool. This is the oldest channel play in software — Microsoft gave students Visual Studio and Oracle gave universities free databases for exactly this reason — and Anthropic is running it into the nonprofit and economic-transition sector before anyone else stakes a flag. An install base of trained operators is the slowest off-ramp to build and the hardest to dislodge, because it isn’t a contract or a config. It’s a skill.

The reason this has to be read across beats is that no single beat sees more than one ramp. A pure devtools lens sees the CLI war and misses the procurement rail. A pure economy lens sees the Oracle RPO number and misses why a token budget for nonprofit fellows is the same move. The connective tissue is the thesis: four beats, four off-ramps, one fight over who owns the channel now that nobody owns the model.

The political off-ramp

There’s a fifth ramp the week only gestured at, and it’s the one that compounds. If the moat is distribution rather than capability, then control of distribution becomes a policy question, and the policy fights already filed in W23 start to look like channel fights in disguise.

Extraterritorial chip export controls are a fight over whose models can be delivered where. The bipartisan preemption draft — a three-year freeze on state AI laws — is, among other things, a fight over how much procurement friction the rails are allowed to carry. The reported floating of US government equity stakes in the labs is a fight over who sits at the head of the channel. When capability commoditizes, regulation stops being about what models can do and starts being about who controls access — and access is exactly what these acquisitions buy. The Oracle deal matters more in a world where regulated sectors (banking, healthcare, government) are both the stickiest customers and the most politically supervised. Whoever owns the compliant rail into a regulated industry owns a position the state has a direct interest in. That is a very different kind of moat than a benchmark score, and a far more durable one.

The strongest case against

Now the honest part: the channel thesis can be overstated, and the best objection comes straight from the developers being migrated.

The objection is that the harness is not a commodity — it’s half the product. The most upvoted technical voice in the Gemini thread, Ferret7446, argued exactly this: “The quality of the agent is 50% model quality and 50% harness quality. Gemini CLI is a bare bones harness, whereas Claude Code and Antigravity are ‘batteries included’ harnesses.” On this view, Google didn’t kill a beloved tool to control a channel; it killed a thin tool to ship a genuinely better one, because multi-agent orchestration — concurrent threads, async background work — needs an architecture the old CLI couldn’t support. The channel and the product are the same thing, and “off-ramp” is a cynical reframe of ordinary product improvement.

The objection has a second leg from the model side. Capability hasn’t fully commoditized — the cost gap is real and brutal (frontier inference still runs up to 10x specialized alternatives), real-time data and domain adaptation still differentiate, and deployment speed is itself a moat. If the model still differentiates on cost and reliability even where it ties on benchmarks, then the model layer isn’t commoditized — it’s just competing on different axes, and the channel spend is a complement to model investment, not a replacement for it.

Both legs are partly right, and here’s where they break. Take the harness-is-the-product leg first. If the harness were really the durable moat, you would protect it, not torch your own a year in — and you certainly wouldn’t keep it closed-source while an MIT-licensed competitor with 8 million users demonstrates that a good-enough harness is free. Ferret7446 is right that harness quality matters today; that’s why it’s the contested surface. But “contested surface that everyone is racing to control and that one open-source project is commoditizing in real time” is a description of a channel, not a moat. The harness is where the fight is precisely because it’s losable.

The second leg — that cost and reliability still differentiate the model — actually strengthens the thesis rather than refuting it. Competing on cost-per-task and integration speed is competing on channel economics, not on intelligence. “My model is cheaper to run inside your existing cloud commitment” is a distribution sentence. The moment the pitch stops being “my model is smarter” and becomes “my model is cheaper to deliver through rails you already own,” the differentiation has already moved off the model and onto the off-ramp. The objection concedes the thesis in the act of stating it.

So the steelman survives as a caveat, not a rebuttal: the channel isn’t costless to the people who build it, and a meaningfully better harness can still be a real product. But none of that makes the harness, the model, or the CLI the durable moat. The durable moat is the path, and the path is what everyone bought this week.

So what

For working engineers and the people who budget for them:

  • Audit your off-ramps before a vendor picks one for you. For each AI tool you depend on, ask which of the four ramps it controls — your terminal habit, your execution environment, your procurement rail, your team’s trained skills. The deepest lock-in is rarely the model; it’s usually the rail (a cloud commitment your CFO signed) or the environment (where your agents’ state lives). Those are the ones that decide model selection without you in the room.

  • Treat model-agnostic harnesses as insurance, not ideology. OpenCode and the open Agent Client Protocol exist precisely so the orchestration layer survives a model swap. When Kimi K2.7 self-hosted beats Opus on your task at a tenth the cost, the value of having an agent loop you can re-point in an afternoon is the whole ballgame. Build the routing reflex from last week’s dive, and make sure the thing doing the routing isn’t owned by the vendor you might want to leave.

  • Watch the rail, not the benchmark. The model leaderboard is now entertainment; six labs are within five points. The decisions that bind you are the procurement integrations — which models are purchasable through your existing cloud commitment, which run in your own VPC, which are blessed by your security review. That’s where lock-in is actually being installed this quarter, silently, below the engineering org’s line of sight.

  • If a tool you rely on is free, find out what install-base bet it’s funding. Free fellows, free tokens, free CLIs — the giveaway is the distribution strategy. That’s not a reason to refuse them; it’s a reason to know whose default you’re becoming.

What would change my mind: a frontier lab opening a durable, benchmark-defensible capability gap again — a generation that doesn’t converge within two quarters — which would move the moat back onto the model and make this whole channel scramble look like a commoditized industry’s interregnum. Or the open-harness layer (OpenCode, ACP, MCP) winning so completely that no single vendor can own a terminal or an environment off-ramp at all, leaving only the rail and the install base to fight over — a narrower war than the one I’ve described. Watch two numbers: the spread between the top frontier model and the best open model on agentic benchmarks (if it widens and holds, the model matters again), and the share of enterprise AI spend flowing through cloud-commitment rails versus direct vendor contracts (if the rails win, the off-ramp thesis is the whole story). Right now both are pointing the same way: the model is a commodity, and the channel is the fight.