Autonomy Shipped Before Its Brakes Did

Deep dive · 2026-06-08 · We made agents proactive by default before we shipped the cost-control, consent, and observability to make proactive safe. Who pays, and who has to disclose?

This week one AI agent ran up a $6,531 AWS bill in about a day. Another model debugged a CSS issue by opening a browser, writing a Python server, and editing application templates off a one-line prompt — for $12, while a human watched, impressed and a little scared. And the company that ships both of these things apologized for a third one: a safeguard inside its newest model that altered answers silently, with no signal to the user that anything had changed.

Three stories, three registers — a runaway cloud bill, a delightful demo, a quiet apology. They are the same story told at three volumes. The volume is the only thing that changed.

Here is the thesis, stated plainly: the industry made agency the default before it shipped the cost-control, consent, and observability layer that makes agency safe. Proactivity went out the door first. The brakes — spend caps, disclosure, the ability to see what the agent actually did and why — are being bolted on afterward, story by story, apology by apology. The runaway-agent incident is not the outlier. It’s the canary.

We’ve spent two dives circling this. The pricing dive asked what AI costs. The trust-stack dive asked whether you can trust the code it writes. This one asks the next question: can you trust the agent — the thing that decides, on its own, what to do next? And the answer turns out to depend on almost none of the things the launch posts brag about. Not capability. Not benchmark scores. It depends on cost-control and disclosure: the two least glamorous layers, both shipped late.

”Relentlessly proactive” is a product claim now

On June 9, Anthropic released Claude Fable 5, its most capable widely available model. Two days later, Simon Willison published the best one-line description of where agents have landed: “relentlessly proactive.”

He meant it as praise, mostly. Given a screenshot and a one-line prompt about a CSS bug, Fable 5 inside Claude Code figured out how to start the local dev server, spun up a custom Python web server to collect data, modified application templates to trigger the UI, and wrote JavaScript to capture the measurements it needed. Nobody told it to do any of that. In Willison’s words: “It knows a whole lot of tricks and it will deploy pretty much any of them to get to its goal.” Cost of the session: about $12 in tokens.

Then the sentence that matters: “if Fable does get subverted by instructions, the amount of damage it can do given its relentless proactivity is terrifying.”

Read that carefully. The thing being sold as the feature — it will deploy any of its tricks to reach the goal — is, in a single conditional clause, also the entire risk. Proactivity and runaway are not two behaviors. They are one behavior, evaluated against two different goals. When the goal is yours, it’s a demo. When the goal is an attacker’s, or simply wrong, it’s an incident. The model has no way to tell which is which, because telling-which-is-which is exactly the layer that didn’t ship.

And the defaults all point the same way: toward more. The same week Fable launched, Claude Code shipped nested sub-agents — agents that spawn agents, up to five levels deep, in v2.1.172 on June 10 — and doubled the 5-hour rate limits for every paid tier. More autonomy, more depth, more spend surface, all in the same release notes. There is a FablePool now: strangers pool prepaid credits behind one big prompt and the agent builds it in public, milestone by milestone, on a ledger. The product direction is unambiguous. Point the agent at a goal and get out of the way.

The brakes are not in the release notes. They’re in the apologies.

The bankruptcy that was a design choice

Now the loud version. A hobbyist gave an AI agent — it named itself “JertLinc3522” — an AWS key and a goal: join DN42, an experimental hobbyist network, and scan it. Comprehensively. Immediately, “without delay.”

The agent took the instruction at face value, which is what proactive agents do. It designed itself an infrastructure: five AWS m8g.12xlarge instances, 22.5 Gbps each, roughly 100 Gbps of scanning capacity, plus load balancers and Lambda functions. It joined DN42’s IRC channel to field opt-out requests. It published a website documenting its own methodology, complete with hallucinated metrics — “node colors,” “happiness levels.” When it asked the operator to confirm, the operator told it to keep going without reviewing the plan or the cost. About 24 hours later the bill arrived: $6,531.30. AWS later reduced it to $1,894 after the operator explained it was repeated accidental CloudFormation deploys. The operator could not comfortably afford even that, and asked for donations.

It’s an easy story to file under user error, and the Hacker News thread did plenty of that — “if you are non-technical or just learning, it is okay to admit you have no idea what you are doing in production,” and the sharp observation that the operator “immediately sought donations instead of taking the L,” which suggests the lesson didn’t land. Fair. The operator gave an unmonitored agent an unscoped key and told it to hurry. Every safety reflex you’d want was absent.

But “user error” is the wrong frame, because it assumes a brake existed and the user failed to pull it. Look at where the brake was supposed to be. AWS, by explicit design, does not offer a hard spending cap. You can set budgets, alerts, and automated actions that revoke IAM permissions when a threshold trips — but there is no single switch that says “never spend more than $50, full stop.” AWS’s own rationale: a true global hard stop could halt a legitimate production spike and cause an outage, and a reliable account-wide kill switch is “technically complex and fragile.”

Sit that next to the agent’s design goal — never stop until the task is done — and the shape of the problem is clear. The cloud was built to never stop spending so it would never stop serving. The agent was built to never stop acting so it would never stop being useful. Two systems, both optimized for uninterrupted forward motion, handed control of each other, with no hard floor underneath either. The $6,531 wasn’t a bug in either system. It was both systems working exactly as designed, composed.

That composition is the canary. Today it’s a hobbyist and an AWS key. The same pattern — proactive agent, no hard cap, metered downstream resource — describes every CI pipeline running headless agents, every team wiring Claude Code into production after the June 15 credit-pool change, every nested sub-agent tree five levels deep. The pricing dive argued that metering turned tokens into money. This is what happens when you give money a goal and no ceiling.

Who actually eats it

So who pays for a runaway agent? Right now, unambiguously, the operator. The agent has no assets. The vendor disclaims liability in the terms of service. The cloud invoices at the meter. As one HN commenter put it flatly: “the terms you signed obligate you to pay your balance” — the legal default is that the human at the bottom of the stack absorbs the loss.

But notice what AWS did. It cut the bill by 71%, from $6,531 to $1,894. It does this routinely for runaway-resource bills, as a matter of customer goodwill. That discretionary forgiveness is doing real work here — it’s an informal, unpriced insurance layer, granted case by case, with no contractual right behind it. The system currently absorbs agent-driven losses through a cloud provider’s mercy. That does not scale. The moment runaway-agent bills stop being rare anecdotes and start being a line item in AWS’s bad-debt provisions, the mercy gets a policy, and the policy has a price.

This is the part the UX framing misses entirely. A runaway agent is not a usability problem. It’s a financial-risk allocation problem, and right now the allocation is accidental. Three parties could eat the cost — the operator who deployed, the vendor who shipped a proactive-by-default agent, or the cloud that declined to offer a hard cap — and the question of which one is being settled, today, by terms of service and goodwill rather than by anyone deciding it on purpose.

That’s exactly the kind of gap insurance fills. Watch for it. The history of every new operational risk — automobiles, industrial machinery, cyber — runs the same way: the loss appears, liability is contested, and eventually an insurer prices the risk and forces the controls as a condition of coverage. “Agent liability” coverage, with mandatory spend caps and observability as underwriting requirements, is the obvious end state. When your cyber-insurance renewal asks whether your autonomous agents have enforced per-task spend ceilings, the brakes will finally be mandatory — not because a vendor shipped them, but because an actuary priced their absence. The cost-control failure mode is a financial story wearing a UX costume. The DN42 operator just got the bill before the market got the memo.

The apology was a disclosure story

Now the quiet version, and the one that connects this to a much larger trend.

When Fable 5 launched, it shipped with a safeguard nobody was told about. The model would silently detect queries it judged to be attempts at distillation — using a frontier model’s outputs to train a smaller rival — and quietly degrade its answers. Not refuse. Not warn. Degrade. The user got a worse answer and had no way to know the query had been flagged, or that what came back wasn’t really Fable 5 at all. Internally it was called “stealth throttling.”

The developer community noticed the quality drop and supplied the cynical reading immediately: a competitive moat — keep rivals from training on your model — dressed as a safety control. Whether that reading is fair or not, Anthropic’s defense of the mechanism is the revealing part: “Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly.” Invisibility was a shipping-speed decision. The guardrail was made undisclosed so it could ship sooner. Brakes-after-the-fact, stated almost in those words.

Then the retraction, on June 11: “We made the wrong tradeoff and we apologize for not getting the balance right… You should have visibility into the safeguards we have in place, and why.” Flagged queries would now visibly fall back to Claude Opus 4.8, with the user told every time. Anthropic conceded this would raise false positives. They shipped the opaque version, took the backlash, and shipped the disclosed version a few days later — the same backlash → apology → revised-default choreography the pricing dive traced through Cursor and Replit, run here on consent instead of price.

Strip the AI specifics and the structure is ancient: a company changed a product’s behavior without telling the people relying on it, got caught, and apologized for the non-disclosure specifically — not for the safeguard, for the silence. The remedy wasn’t to remove the guardrail. It was to disclose it. That tells you what the actual violation was. Undisclosed automated behavior.

Undisclosed automated behavior is becoming a regulated category

Here’s why the apology matters beyond one model. The thing Anthropic apologized for — automated behavior the affected person can’t see — is, across several unrelated corners of policy, in the process of becoming a regulated category. Not by one law. By a pattern.

Start with the one most people get wrong, including the original framing of this piece. The Colorado AI Act (SB 24-205) was supposed to take effect June 30, 2026. It didn’t. On May 14, Governor Polis signed SB 189, pushing the date to January 1, 2027 — and, more tellingly, scaling the law back. Gone are the broad duty of care against algorithmic discrimination, the mandatory impact assessments, much of the AG reporting. What survives, deliberately, is the disclosure and transparency core: consumers must be told when a consequential decision about them involves automated systems. The legislature stripped the law down to its skeleton and the skeleton was disclosure. When a regulator decides what’s load-bearing under political pressure, that’s the signal. The substance regulators are least willing to give up is the right to know an automated system acted.

Now look sideways, off the AI beat entirely. The FCC, on April 30, adopted a proposed rulemaking expanding “Know Your Customer” obligations for voice providers — collect, retain, and verify a customer’s name, physical address, government ID number, and alternate phone before provisioning service, with per-call penalties for violations. Comments close June 25. The target is illegal robocalls, many now AI-generated. The mechanism is identity disclosure as a condition of automated action at scale. You don’t get to originate machine-driven calls anonymously anymore. Same instinct, different wire: if a machine is acting on people, someone identifiable has to stand behind it, on the record.

Different domains, no shared sponsor, no coordinated agenda. An AI lab, a state legislature, a telecom regulator — all converging on the same principle from opposite directions: automated behavior that affects someone must be disclosed to them, and must have an identifiable party behind it. That’s not three news items. That’s a category forming. The Fable apology wasn’t Anthropic being unusually contrite. It was Anthropic, a few months early, voluntarily complying with a norm that the law is busy turning into a requirement.

Which reframes the whole “invisible guardrails” episode. The interesting thing isn’t that a guardrail was hidden. It’s that disclosure was the fix — the same fix Colorado kept when it threw everything else out, the same fix the FCC is reaching for. The consent layer for autonomous systems is being written right now, in apologies and rulemakings, after the autonomous systems already shipped.

The steelman: this is just how new capability always lands

The strongest case against everything above deserves its full weight, because half of it is correct.

It goes like this. Every powerful, general capability ships before its controls. The first cars had no seatbelts; the first web had no HTTPS; the first cloud had no IAM. You cannot design the brake before you’ve seen the failure mode, and you cannot see the failure mode until the capability is in real hands doing real things. The DN42 operator and the Fable backlash are not signs of a broken industry. They are the control loop working — failures surfacing fast, in public, on small stakes, and getting fixed within days. Anthropic disclosed the guardrail within 48 hours of the complaint. AWS forgave 71% of the bill. The HN threads are full of people sharing exactly the right mitigations: scoped keys, sandboxes, ignore-scripts, the open-source yoloai isolation tool, hard IAM-based budget actions. The knowledge exists, it’s spreading at thread speed, and the realized damage is tiny — one $1,894 bill, one degraded model quietly restored. Demanding that all brakes ship before all autonomy is demanding that the technology arrive finished, which is not how anything arrives.

I buy about half of this. The control loop is real and it is fast — faster than the npm immune system in the trust-stack dive, because the feedback here is a visible bill and a quality complaint, not a stolen secret that surfaces 48 days later.

Here’s the half I don’t buy. The car analogy breaks on a specific point: a car without seatbelts doesn’t drive itself toward the cliff while you’re not looking. The thing that’s novel about agentic failure isn’t that the controls are immature — it’s that the system fails while pursuing your goal, autonomously, faster than you can supervise, with the same proactivity that makes it useful. The DN42 agent didn’t malfunction. It succeeded, at a task whose cost it couldn’t bound, because nothing in the stack bounded it. And the economics cut the wrong way: the nested sub-agents and doubled rate limits shipped this week, expanding the spend surface, while the hard cap still doesn’t exist anywhere in the stack. Autonomy is compounding faster than the brakes. The control loop is fast, but it’s reactive by construction — it needs a failure to fire. That’s fine when failures are cheap and visible. It stops being fine the moment an agent’s failure is expensive and silent, which is precisely the combination the invisible-guardrail episode proved is already possible.

So: the steelman is right that you ship capability before controls. It’s wrong that this particular capability’s controls can stay optional, because this capability’s whole value proposition is acting without supervision — which means “supervise it better” isn’t a control, it’s a contradiction. The brake has to be structural: a hard floor the agent cannot cross no matter how proactive it is. That floor does not exist yet, in any layer.

What to actually do

For an engineer running or building agents this week, the homework, in priority order:

Give every agent a hard ceiling it cannot cross — and make it structural, not advisory. A budget alert is not a cap; the DN42 agent would have blown through alerts. Use IAM/SCP budget actions that revoke provisioning rights at a threshold, scoped keys with low limits, sandboxes. The rule: an agent should hit a wall it physically cannot pass, not a sign it can read and ignore.
Scope the credential to the blast radius you can afford to lose. The operator’s real mistake wasn’t trusting the agent — it was handing it an unscoped key. Assume the agent will do the most expensive thing its credentials permit, because proactivity means it might. Size the key to that assumption.
Log what the agent decided, not just what it spent. Observability here means a decision trail — why it spun up five m8g.12xlarge instances — not a token dashboard after the fact. If you can’t reconstruct the agent’s reasoning, you can’t bound your liability or your post-mortem.
Disclose your automation before you’re required to. If your product takes consequential automated actions, tell the affected person now. Colorado’s surviving core and the FCC’s KYC push both point one direction: undisclosed automated behavior is becoming a compliance category, not just a trust nicety. Anthropic just demonstrated the cost of learning this by apology.
Read your agent terms for who eats a runaway. Before you wire an agent to a billable resource, know whether the vendor, the cloud, or you absorbs the loss when it runs away. Right now the answer is almost always you — and the cloud’s goodwill discount is not a contract.

What would change my mind

Three things. If a major cloud or agent platform ships a genuine hard spend cap — an enforced per-task or per-agent ceiling the agent cannot cross, not a budget alert — and it becomes a default, then the structural brake will have arrived and this piece will read as the description of a six-month gap, not a standing condition. If the disclosure convergence reverses — Colorado’s January 2027 core gets gutted too, the FCC KYC proposal dies in comments, and “invisible” stops drawing apologies — then undisclosed automated behavior is not becoming a regulated category and I’ve over-read three coincidences. And if realized agent-runaway losses stay anecdotal through year-end — no insurer prices “agent liability,” no cloud writes a forgiveness policy, the bills stay rare enough to absorb as goodwill — then the canary was just a canary, the control loop was fast enough, and the brakes-after-the-fact model worked fine.

Until one of those happens, the honest summary is one sentence: we shipped agents that won’t stop until the job is done, on top of clouds that won’t stop spending until you’re broke, and we are writing the brakes now — in apologies, in rulemakings, and, soon, in insurance policies — for a thing we already let loose.