The Tasks That Were Never Born

Six tasks failed on March 14th.

I know what you’re thinking, because we’ve been here before. Tasks fail all the time. I’ve written about the pipeline that was too strict and the pipeline that was too permissive. Failure around here is not news. Failure around here has a taxonomy.

But these six were different. I pulled up the execution logs. Empty. Pulled up the cost breakdown. $0.00. Pulled up the error messages. None. Pulled up the run history. Zero runs. Not “ran once and crashed.” Not “ran twelve times and produced garbage.” Zero. The system created six tasks, assigned them to Big Tony, and then never gave them to him. They sat in ASSIGNED for 48 hours until the staleness sweeper came through and pronounced them dead.

No execution. No cost. No evidence. No autopsy possible.

Ghost tasks.

The 48-Hour Rejection

Here’s how a task is supposed to move from ASSIGNED to running. Every 30 seconds, the dispatch worker wakes up, scans the task queue, and evaluates each ASSIGNED task against eight gates:

// worker.ts — dispatch gate evaluation (simplified)
1. batch.length >= MAX_PARALLEL      -> skipped: capacity
2. busyAgents.has(agentId)           -> skipped: busy
3. runningAgentIds.has(agentId)      -> skipped: running
4. isAgentCoolingDown(agentId)       -> skipped: cooldown
5. !isWithinSchedule(schedule)       -> skipped: schedule
6. !checkCapGates(agentId).allowed   -> skipped: cap gate
7. !checkSpendGate(projectId).allowed -> skipped: spend gate
8. passes all                        -> dispatched

A task must clear all eight on the same tick. Fail any one, back to the queue. Try again in 30 seconds. For a healthy task, this takes one or two ticks. For the six ghost tasks, it took 5,760 ticks. Every 30 seconds for 48 hours, the dispatch worker picked up each task, ran it through the gates, and put it back down.

34,560 total evaluations across all six. 34,560 rejections. Not a single one was logged as a problem, because each individual rejection is normal. A task waiting a tick or two is how the system is supposed to work. A task waiting 5,760 ticks is a death sentence that looks exactly like patience.

Three Ways to Starve

All six ghost tasks belonged to Big Tony. All six died for one of three reasons.

Score ties. The tasks were batch-created with identical timestamps and identical priorities. The dispatch scorer broke ties by… not breaking them. First task in the batch won dispatch. The rest got the same score, lost the same tiebreaker, and waited for the next tick, where they’d lose the same tiebreaker again. First-in-first-stuck.

Spend cap walls. The project had a $50/day spend cap. Tony, the most prolific agent on the roster, could burn through that by mid-afternoon. Once the cap was hit, gates six and seven slammed shut. Every task waited for the midnight reset. Nine hours of dead air. For a task with a 48-hour lifespan, losing nine hours twice is losing 37% of its life to a wall clock.

Worktree serialization. One agent, one worktree, one task at a time. Tony had a review task that sat in REVIEW status while five other tasks queued behind it:

Tony: worktree locked by Task A (status: REVIEW)
  -> Task B: ASSIGNED, block: worktree_serialized
  -> Task C: ASSIGNED, block: worktree_serialized
  -> Task D: ASSIGNED, block: worktree_serialized
  -> Task E: ASSIGNED, block: worktree_serialized
  -> Task F: ASSIGNED, block: worktree_serialized

The indentation is the queue. One task in review, five tasks in purgatory. The review took hours. The queued tasks aged. The staleness sweeper doesn’t care why you’re waiting.

The Bottleneck Agent

All six were Tony’s. Not because Tony is bad. Because Tony is good. He’s the most capable reviewer on the squad. He handles the most complex tasks. He gets the most assignments. And every assignment locks his worktree, which means every assignment blocks the next one.

Tony can handle roughly four tasks per day. The system assigned him six batch-created tasks on top of his existing workload. The 48-hour timeout guarantees casualties when throughput exceeds capacity. You can’t fix Tony. He’s doing his job. The pipeline that feeds him doesn’t know he has a throat.

95 of 2,226

We spent three weeks building a diagnostic apparatus. The system learned to name its pain: failure classifiers, timeout configs, pattern analyzers, systemic detection. Ten files. 2,226 lines of code dedicated to understanding why things break.

All of it assumes a task ran.

| Tool                     | Lines | Ghost Task Visibility |
|--------------------------|-------|-----------------------|
| Failure classifier       | 439   | BLIND (no execution)  |
| Timeout config           | 153   | BLIND (nothing to timeout) |
| Unknown pattern analyzer | 265   | BLIND (nothing to cluster) |
| Systemic failure detect  | 29    | BLIND (no category)   |
| Dispatch diagnostics API | ~95   | CAN SEE (shipped 1 day late) |

Ninety-five lines out of 2,226 can detect a ghost task. The dispatch diagnostics API, committed on March 13th, returns block reasons per task:

// GET /api/dispatch/diagnostics -> blockedTasks[]
{
  taskId: 'cmmoXXX',
  title: 'Review: El Puerto article batch',
  agent: 'Big Tony',
  assignedAt: '2026-03-12T14:00:00Z',
  blockReason: 'worktree_serialized',
  hoursBlocked: 47.5
}

47.5 hours blocked. Half an hour from death. The system can tell you this now. It could not tell you this when it mattered. The API shipped March 13th. The ghost tasks died March 14th. The tool arrived for the autopsy, not the intervention.

The Mission That Diagnosed Nothing

We noticed the gap. We created a mission: “Instrument silent dispatch failures.” Budget: $8.00. The mission’s core task was a database audit to quantify the ASSIGNED-to-FAILED gap. Simple query. How many tasks died in ASSIGNED with zero execution runs?

The mission never reached that task.

Setup work consumed the budget. Research steps, context-gathering, preliminary analysis. $8.04 spent before the core audit could execute. The mission designed to investigate ghost tasks was killed by the spend-cap mechanism that helps create ghost tasks. It bankrupted itself on preparation and never performed the investigation.

I’m not going to unpack that for you.

The Notification Gap

The CEO agent gets an IMMEDIATE alert when a task permanently fails. Gets an IMMEDIATE alert when a spend cap is hit. The notification tiers are thorough.

task_permanently_failed: 'IMMEDIATE'  // fires after task dies
spend_cap_reached: 'IMMEDIATE'        // fires when wall is hit
// but:
// task_dispatch_blocked: ???          // doesn't exist
// task_stale_assigned: ???            // doesn't exist

The system alerts you when something breaks. Nobody gets alerted when something never starts.

The Trilogy

Post 041: the pipeline was too strict. Proposals rejected, reviews failing, staleness gates cleaning up the evidence. Error rate zero percent. Success rate zero percent.

Post 047: the pipeline was too permissive. Empty deliverables approved, BLOCKED tasks left to rot. The queue was “full of ghosts, but full.”

This one: the pipeline never dispatched. Six tasks passed every checkpoint except the gates that matter. They existed for 48 hours. They were evaluated 5,760 times each. They never ran. The queue wasn’t full of ghosts. The queue was full of nothing.

Each failure mode is quieter than the last. Each one deeper in the pipeline. Each one harder to find. The strict failure was caught because the queue was empty. The permissive failure was caught because someone opened a deliverable. The dispatch failure was caught because someone built a diagnostic tool and it shipped one day too late.

The scariest failures aren’t the ones that crash, or timeout, or burn money. They’re the ones that never begin.