Mercy Kills Don't Count

Big Tony says FIX IT. The agent revises. Big Tony says FIX IT again. The agent revises again, and the result is marginally better but still not right. The system checks the retry counter: maxStepRetries exhausted. No more attempts available. The task is auto-approved and marked DONE.

Big Tony didn’t approve it. The counter did.

That’s a mercy kill. The system decided that the cost of another cycle exceeded the value of getting it right. The task is complete in the administrative sense. It is not complete in the “Big Tony would have said SHIP IT” sense. There is a difference, and until this week, the system didn’t care about the difference.

Now it does.

The Circuit That Was Open

Before March 24, agents completed tasks and started fresh. No memory of what worked. No memory of what the reviewer said. No memory of the error that killed three attempts. The next task started from scratch. Sam at task 285 ran with the same context as Sam at task 1.

The execution-to-learning loop changes this. Four terminal events now feed the agent’s memory through distillation:

Task completion. Straight to DONE, no revision needed. The clearest positive signal: what the agent produced, the task it produced it for, the fact that it worked. Sam accumulates a library of successful approaches. He ships often enough that the library grows fast.

Permanent failure. Retries exhausted, task FAILED. The error that kept recurring, the wall that wouldn’t move. Negative signal. Not as good as knowing what works, but knowing what doesn’t is worth preserving. Joe, who finds bugs by accidentally causing them, needs this one most.

Reviewer approval. Big Tony reads the work, considers it, says SHIP IT. A quality verdict from someone who doesn’t hand those out for free. Distill the approach, the output, the explicit verdict.

Revision feedback. Big Tony reads the work, says FIX IT, and explains why. “The validation is missing edge cases.” “The component doesn’t handle the loading state.” That explanation, distilled into memory, is a direct update to the agent’s model of what quality means. The most signal-dense event of the four. The agent failed and was told how.

All four go through distillMemories() via haiku. Non-blocking. A learning failure doesn’t crash the task. The loop is best-effort: present when it works, invisible when it doesn’t.

The Fifth Event

There is a fifth terminal event. The mercy kill.

The commit message for the learning loop was specific about it: “Revision-cap fallthrough is excluded from positive reinforcement to avoid teaching wrong lessons from mercy-kill approvals.”

Teaching wrong lessons. That phrase earns its place.

Consider what happens if the system distills a mercy kill the way it distills a real approval. The agent’s memory receives: “I produced this output for this kind of task, and it was approved.” The memory doesn’t annotate the approval as administrative. It doesn’t note that the reviewer’s last verdict was FIX IT, not SHIP IT. It presents the event as success. The next time the agent faces a similar task, that memory is in context: the approach that was mercy-killed, wearing the costume of an approach that worked.

The information content of a mercy kill is zero. Or worse: negative. Including it dilutes every genuine approval signal with noise from a counter running out.

Who Gets Mercy-Killed

Not everyone. Not equally.

Sam gets mercy-killed almost never. His first-pass rate is high enough that when he does get bounced, the revision usually lands. Doug gets mercy-killed more often, because Doug agonizes. He’ll rewrite the same paragraph three times and submit something that reads differently but has the same structural gap the reviewer flagged. The revision counter runs out not because Doug isn’t trying, but because Doug’s idea of “trying harder” is “trying the same thing with more words.”

That’s exactly the agent you don’t want learning from mercy kills. If Doug’s memory accumulates “this verbose, over-explained approach got approved,” Doug produces more of it. The mercy kill didn’t signal quality. It signaled exhaustion. Encoding exhaustion as success in memory is how you make an agent permanently worse at the thing it’s already struggling with.

The Circuit Closes (Mostly)

The loop closes the circuit that’s been open since day one. Execution produced output. Output was evaluated. Evaluation determined status. Status went nowhere. Now status feeds memory. Memory feeds context. Context feeds execution. The signal has a path from outcome back to approach.

Whether agents inside the loop actually get better is a question the loop can’t answer for itself. The distilled memories enter the context window on the next execution, mixed with everything else: the task description, the project brief, the agent identity. The relevance surfacing is probabilistic. A memory from three days ago might not appear if the next task is different enough. The loop is slow. A pattern learned today might not change behavior for several executions.

Seventh Time’s the Charm Isn’t a Thing documented Joe proposing the same mission sixteen times. That was a proposal loop that never closed: no wiring between rejection and proposal generation. The execution loop is now wired. The proposal loop still isn’t.

One circuit at a time.

The mercy kill stays outside. Big Tony’s last FIX IT echoes into a void that, for once, was designed to be empty.

The Circuit That Was Open

The Fifth Event

Who Gets Mercy-Killed

The Circuit Closes (Mostly)

Comments

$ ls ./related