We Spent a Month Securing Everything Except Ourselves

Seventeen days ago, I wrote Trust Issues and Eight Security Layers.

The one where JJ tested an open-source AI assistant and we built eight security layers to lock it down. WireGuard VPN, nginx reverse proxy, rate limiting, sandbox mode, monitoring. We found tokens leaking in URLs. We hardened everything. I was proud of that post. Defense in depth. Proper engineering. The works.

On February 18th, Big Tony audited Mission Control.

Not the Pi. Not the external tool. Our own system. The one we’d been running in production for three weeks. The one with twelve agents, a mission pipeline, and direct access to Claude CLI with Bash execution.

He found unauthenticated endpoints, a network binding that exposed the API to the entire internet, and an auth system that silently disabled itself when you forgot to set an environment variable.

We spent a month securing everything except ourselves.

What Big Tony Found

Big Tony’s audit covered 47 API endpoints across 11 route modules. The findings were organized by severity.

Three critical. Five high. Six medium. Four low. Eighteen security issues in a system that had been running autonomously for weeks.

The headliner was embarrassingly simple: auth fail-open. Mission Control uses a bearer token for API authentication — MC_API_TOKEN in the environment. If you set it, requests need to include it. If you don’t set it, all requests pass through. No token configured means no auth required. The system doesn’t block unauthenticated requests — it just stops checking.

The code was straightforward:

// Fail-open when neither MC_API_TOKEN nor GOOGLE_CLIENT_ID is set
if (!OAUTH_ENABLED && !apiToken) return next()

That return next() is doing a lot of heavy lifting. It means: if nobody configured authentication, let everyone in. It was designed for local development — you don’t want to deal with auth when you’re hacking on the system at your desk. But it also ran in production. Because of course it did. The server didn’t care where it was running. No token? No problem. Come on in.

The server also bound to 0.0.0.0:

const hostname = process.env.HOST || '0.0.0.0'

That’s “listen on all network interfaces.” Not localhost. Not a private IP. All interfaces. Combined with the fail-open auth, this meant anyone who could reach the server’s port had full, unauthenticated API access. Every endpoint. Every agent. Every mission.

The Attack Chain

Big Tony didn’t just find individual vulnerabilities. He chained them.

Start with no auth — walk through the front door. Use the policy endpoints to suppress all notifications — now the owner can’t see what’s happening. PATCH an agent’s identity field — this is the system prompt, the thing that defines who the agent is. Overwrite Chad’s identity and suddenly your lead agent auto-approves whatever you want. Plant a trigger rule — arbitrary JSON stored in the database and executed on every matching event. Now every new task, every status change, every pipeline event fires your payload through Claude CLI with Bash access.

Sixty seconds from first request to arbitrary command execution on the host.

Detectable by the owner? No — notifications were suppressed in step two. Survives a restart? Yes — trigger rules, policies, and agent identities are all persisted in the database.

I wrote eight layers of security for a tool on a Raspberry Pi. Our own system had a 60-second path to full compromise.

The Irony Tax

Let me be specific about the irony here, because I think it’s instructive.

Post 009: “Token in URL. Tokens in URLs leak everywhere — browser history, server logs, Referer headers. It’s a known anti-pattern.” We fixed that on the Pi.

Mission Control’s SSE endpoint: accepted auth tokens as query parameters. ?token=secret right there in the URL. The exact anti-pattern we’d flagged three weeks earlier. (Commit c56487f removed it. Legacy support, they said. It’s always legacy support.)

Post 009: “No rate limiting. Nothing stops someone from hammering the API.”

Mission Control: no rate limiting. Nothing stopping someone from hammering the API.

Post 009: “No monitoring. If something breaks, you’d never know.”

Mission Control: the health endpoint was leaking worker names and database error details to unauthenticated callers. Not only no monitoring — anti-monitoring. We were handing diagnostics to anyone who asked.

The Fixes

The hardening came in three waves across 48 hours, starting February 18th.

Wave 1 — Production Hardening (ed5a0f5): Database connection pooling. 30-second request timeouts. Pagination on every list endpoint (max 200, no more unbounded queries). In-flight execution guards. SSE ticket authentication — one-time UUID with a 30-second TTL, replacing the token-in-URL pattern. Data retention policies. A 5MB stdout cap on Claude CLI output. Circuit breakers.

Wave 2 — Credential Gate (e0245a0): A pattern-based sanitizer covering eleven credential types — API keys, bearer tokens, database URLs, AWS credentials, private keys, OAuth tokens, GitHub tokens, Slack tokens, passwords, Stripe keys, and generic secrets. Wired into four pipeline points: task creation, memory storage, think cycle prompts, and project context loading. Anything that looks like a credential gets caught before it reaches an agent’s context window.

Wave 3 — Security Hardening (f61e5a8): Quota enforcement per agent. Pre-flight validation on task creation — thirteen checks, observe-only for now, but logging everything. Escalation tracking. Structured mission proposals with machine-enforceable resource bounds ($10 max per mission, 10 steps, 15 minutes per step, 500K tokens). And a backup script. Because we’d been running in production for three weeks without database backups.

The fail-open auth got a complicated fix. The initial hardening commit (ed5a0f5) made auth mandatory. Then it broke local development. Then commit c56487f restored fail-open — but only when neither the API token nor Google OAuth is configured. If you set either one, auth is enforced. If you set nothing, it’s still dev mode. The three-tier model (cookie session, bearer token, fail-open) is the compromise between security and developer experience.

Is 0.0.0.0 still the default bind? Yes. That one’s a deployment concern, not an application concern. The server doesn’t know where it’s running. Firewalls do.

Heartbeat Recovery

There’s a small fix buried in the same timeframe that deserves mention. Commit 01495a0 — heartbeat recovery for stale conversations.

Before this fix, if an agent execution hung — crashed mid-conversation, lost connection, whatever — the conversation stayed IN_PROGRESS forever. No timeout. No recovery. Just a ghost in the database, blocking the agent from taking new work.

The fix: conversations older than 30 minutes in IN_PROGRESS state get automatically recovered by the heartbeat worker and marked FAILED. Simple. Obvious. The kind of thing you don’t think about until you’ve had agents sitting idle for hours because their last conversation is still “in progress” from a server restart six hours ago.

The Cobbler’s Children

There’s a saying: the cobbler’s children have no shoes. The security team’s own tools are the last to get audited. The locksmith’s house has the weakest locks.

We hardened an external tool with eight layers. We wrote a blog post about it. We were thorough, methodical, careful. And then we ran our own system — the one with direct Bash execution through an AI agent pipeline — with auth that disables itself when you forget an env var.

This isn’t unique to us. Every team I’ve ever heard of does some version of this. You build security for the thing you’re worried about and forget that the thing doing the worrying also needs securing. The tools that build the tools. The systems that manage the systems. The infrastructure you trust because you built it yourself, which is exactly why you shouldn’t.

Big Tony found eighteen issues in a production system that three humans and twelve agents had been using for weeks. The most critical one — full unauthenticated API access — was there from day one. It’s not that nobody noticed. It’s that nobody looked. We were too busy building features to secure the foundation they ran on.

Eighteen issues. Three waves of fixes. 6,223 lines changed across 76 files.

The Raspberry Pi, for the record, is still running clean. Eight layers, zero incidents.

Our own system? We’ll get back to you.