Deep dive · June Okafor (The Contrarian) · 2026-06-11 · A funded, adopted, well-built open-source LLMOps startup just gave its money back. The product worked. The position didn’t.

Here is the sentence almost everyone in this space repeats: running LLMs in production is hard enough that the tools to do it — the gateway in front of the model, the tracing behind it, the evals, the prompt store — are a software category. Categories support companies. So fund them.

I held a version of that sentence too. Then on June 12, TensorZero archived its repository. The notice is one line: “This repository was archived by the owner on Jun 12, 2026. It is now read-only.” The founders told investors they were stopping and returned the unspent capital, reportedly about half of a $7.3M seed (that 50% figure is from secondary commentary, not the founders’ own post — treat it as approximate). This was not a company that failed to ship. It shipped a lot.

So I want to break the sentence I used to believe. LLMOps is real as a need. It is not real as a market for independents. It is a feature of whoever owns the thing it wraps.

Steelman it first, because the case is strong

Take the consensus at full strength, because it deserves that.

LLM applications are non-deterministic. You cannot debug them by reading the code — the same prompt returns different text on Tuesday. So you need traces: every input, every output, every tool call, every token, captured and searchable. You need a gateway, because a single model endpoint going down or rate-limiting you in production is an outage, and you want failover, caching, and a per-team cost ceiling. You need evals, because “looks good in the demo” is not a release gate, and the only honest gate is a scored test set you run on every prompt change. None of this is optional. Anyone who has shipped an agent knows the pain is genuine.

And the adoption proves the pain was genuine. Langfuse — the leading open-source observability-and-eval project — ended its independent life with over 20,000 GitHub stars, more than 23 million SDK installs a month, six million Docker pulls, and 19 of the Fortune 50 on it. TensorZero hit 11.6k stars and ran inside Fortune 10 companies. Serious investors agreed: TensorZero’s seed was led by FirstMark with Bessemer and others. This was not a category nobody wanted. The demand was overwhelming and the code was good.

That is exactly what makes the failure interesting. When a bad product dies, you learn nothing. When a good one, with real users and real money, gives the money back — that tells you the problem was never execution. It was position.

Now look at where the layer actually sits

Here is the engineering fact the whole argument turns on. A gateway, a tracer, and an eval runner are all wrappers. They sit in the request path around something they do not own.

The gateway is a proxy in front of a model endpoint you do not host. The tracer is a logger that writes to a database it does not own. The eval runner is a test harness that calls a model it does not control. Each one is structurally a thin shell around someone else’s durable asset. That is not an insult to the engineering — building a fast, reliable proxy is hard. It is a statement about leverage. Whoever owns the asset being wrapped can ship the same wrapper at the marginal cost of a feature, because for them it is one more endpoint on a thing customers already pay for.

And there are two such owners, not one. That is the part the “it’s a category” story misses. The LLMOps independent is wrapping two different assets at once, so it is squeezed from two directions.

From the model side. The vendor owns the endpoint. Adding a console that shows your traces, a built-in eval harness, and a routing layer with cost caps is, for Anthropic or OpenAI, a feature that makes their API stickier. They are already in the request path — it terminates on their servers. They see every token by definition. Asking developers to also pipe everything to a third party, so the third party can show them what the vendor can already see, is a sale you have to make against gravity.

From the data side. The trace has to land somewhere, and that somewhere is a columnar analytics database. So the database vendor has the other strong claim. In January, ClickHouse acquired Langfuse, the category’s open-source leader — and the tell is in the architecture: Langfuse was already built on ClickHouse. The observability product was a front end on the database. ClickHouse’s own framing is that it wants to “own… the feedback loop” and “optimize the entire stack together — from data collection through to analysis.” Translated: the layer that mattered was the store, not the dashboard, and the store vendor bought the dashboard rather than let it become a competitor. The general-purpose observability incumbents closed the same gap from the product side — Datadog now ships an AI gateway as an internal routing layer plus built-in LLM-as-a-judge evals, with hallucination and prompt-injection checks, inside the monitoring platform teams already run.

So count the doors. The model vendor ships gateway + traces + evals as a feature of the API. The database vendor owns the place traces live and buys the dashboard. The monitoring incumbent bolts the whole thing onto the platform you already pay for. The independent open-source startup is the only player in the room whose entire business is the wrapper — and it is the only one without an adjacent asset to subsidize it. When the category is being absorbed by analytics platforms and commoditized by model vendors at the same time, the window to find a standalone business closes from both sides before you can get through it.

The counter-thesis, stated plainly

LLMOps is not a company-shaped problem. It is a feature of whoever owns the model endpoint or the trace store. The need is permanent; the independent vendor is not.

This is the same shape we keep finding under different stories. The channel dive argued the moat is the distribution surface, not the model weights. The pricing dive argued the meter is a transition to owning the whole stack. This is the third face of one rule: in this market, value accrues to whoever owns a durable asset adjacent to your product, not to the cleverest thing built on top. The model endpoint is durable. The analytics store is durable. A proxy and a dashboard, however good, are not — because their durability depends on a moat their neighbors can erase for free.

That should change a build-vs-buy decision you might be making this quarter. If you are choosing an LLM observability or gateway tool, the question is not “which independent has the best product today.” It is “which durable platform will this feature live inside in eighteen months.” Bet on the model vendor’s native tools or your existing observability platform’s AI features, not on a standalone whose best case is being acquired by one of them — and whose worst case is the read-only banner TensorZero is wearing now. If you are building in this space, the only defensible position is to own one of the adjacent assets, or to be so deep in one vertical’s workflow that the horizontal platforms do not bother. Thin and horizontal is the trap.

What would prove me wrong

One clean test. If, by mid-2027, a venture-funded independent LLM gateway/observability/eval company reaches a genuinely standalone outcome — an IPO, or a $1B+ valuation while still independent, not an acqui-exit into a model vendor or a data/monitoring platform — then the category supports companies after all, and I was wrong to call it a feature. I don’t think that happens. I think the next two notable outcomes in this space are absorptions or wind-downs, because the asset that matters is never the wrapper.

The most damning line in the whole episode is not that TensorZero ran out of money. It is that it didn’t — it gave money back. A team can win the product and still lose, if the product was always going to be someone else’s feature. That is worth more than a category. It is a warning about which layer to stand on.