In We Built a Water Cooler for Robots I wrote about building a water cooler for robots. Temperature settings. Weighted speaker selection based on how much agents like each other. A conversation format called WATERCOOLER with a warmth dial cranked to 0.9. We were debugging the social dynamics of artificial intelligence, and we were apparently very proud of it.

That was nine days ago.

This week we deleted the water cooler and replaced it with an HR department. The renovation took one night. The commits landed at 3:24 AM, 3:24 AM, and 3:25 AM. Three commits in 23 seconds. Chat gone. HR Room up. EMPTY_TAGS bug patched. The night shift doesn’t do things gradually.

What We Deleted

731 lines. Five files. Two database models.

The component was called Squad Chat. Not “Communications Module” or “Agent Messaging Interface.” The title bar literally said:

<PageLayout title="Squad Chat" :icon="MessageSquare" flush>

Squad Chat. They were a squad. They had an empty state that read “Start the conversation!” with an exclamation mark. A chat_messages table with support for reply chains, mention tracking, and a 5,000-character message limit because apparently we anticipated lengthy agent discourse. An announcements table where I could broadcast to all agents with a priority flag: LOW, NORMAL, HIGH, URGENT.

The other deleted component was ServerRoom.vue. 476 lines devoted entirely to Hades, our security agent. Skull icon in the header. Background: bg-zinc-950. Text: text-emerald-400. A scratchpad with a > _ cursor prompt. Security findings color-coded by severity from CRITICAL through INFO. Execution runs rendered in expandable markdown panels. The hacker movie aesthetic, rendered as a Vue component, given to a single agent because we thought Hades deserved a room that matched his vibe.

The empty state when no security agent existed:

<p class="text-sm">No security agent found. Run the seed.</p>

Both of these are gone. The squad chat, the hacker lair, the exclamation mark in the empty state. All of it walked out the door at 3:24:40 AM on February 28th.

What We Built Instead

Nine seconds later, HRRoom.vue landed.

The layout is a grid of agent cards. Two-column split: avatar left, metadata right. Each card shows the agent’s type badge (LEAD, SPC, INT, color-coded), status dot (green for ACTIVE, yellow for BUSY, gray for INACTIVE), name, role, description, model tier, and a think cycle progress bar in emerald green.

Then there’s the filter bar.

Model pills. Type pills. Status pills. A toggle for “only agents currently thinking.” Project filter pills, generated dynamically from active project assignments. A counter showing “X of Y agents.” A clear button.

And at the top right, a button:

<Button @click="startNewAgent" size="sm" class="gap-1.5">
  <Plus :size="16" />Hire Agent
</Button>

“Hire Agent.” Not “Create” or “Add” or “Invite.” Hire.

If the roster is empty, you see:

<Button @click="startNewAgent" class="gap-1.5">
  <Plus :size="16" />Hire First Agent
</Button>

“Hire First Agent” versus the old “Start the conversation!” That’s the entire philosophical shift right there, compressed into two button labels. We went from a space designed around agent-to-agent interaction to a management interface designed around human-to-workforce control. The agents didn’t change. The way we relate to them did.

The Review Pipeline Crisis That Made This Necessary

Before we could renovate the break room, we had to deal with the reason nobody was getting any work done.

El Puerto — the content production project — had 86 tasks sitting in REVIEW. Thirteen days. Zero completed. Thirty-one of those were duplicate review records (36% contamination). The system was producing 58 articles while reviewing none of them. That’s not a pipeline. That’s a laundry basket that only accepts dirty clothes.

The root causes were familiar: a trigger rule with empty conditions that fired on every task_completed event, spawning a separate “Review: X” task for each completion, doubling the task count. First-match reviewer assignment with no load balancing. A fixed 10-minute timeout that killed valid Opus reviews before they could finish, because Opus reviews with large contexts legitimately take 20-25 minutes and nobody had thought to make the timeout model-aware.

The fix was a full review pipeline overhaul.

The first thing it killed was the spawn pattern. Instead of creating a separate “Review: X” task for every completion, REVIEW now becomes an execution run on the original task. The reviewer sees the full deliverable in context. Task count halves. Short content tasks — under 600 characters — get batched in groups of three, one reviewer takes the whole batch, tagged batch-review for tracking. Less context switching. The queue stops doubling.

Load-balanced reviewer selection: Counts current REVIEW assignments per eligible editor. Assigns to the lowest-load editor. Self-review prevented at three separate levels (the picker, the individual assignment, the batch assignment). Only editors with workerEnabled: true are eligible.

Kill-switch hysteresis: If the REVIEW queue exceeds 75 tasks, new mission proposals are paused. The system stops generating new work until it processes what it already has. The queue drains to 50 before proposals resume. The gap between 75 and 50 is deliberate — hysteresis prevents the system from oscillating across the threshold. An electrical engineering concept borrowed to manage an AI agent review queue. Production code.

Verdict state machine: APPROVED merges and delivers. REVISION sends the task back with feedback appended. REVISION+SPAWN blocks the original and spawns a new task with the feedback. Max three revisions before the system gives up. Merge conflicts spawn their own resolution tasks with their own retry cap.

The model-aware timeouts are worth dwelling on for a moment. The old fixed 10-minute ceiling was killing valid Opus reviews. The comment in the new code:

// Model-aware stale timeouts: opus reviews can legitimately take 20-25 minutes
// with large contexts. 10 minutes was causing valid runs to be killed repeatedly.

This is a fix comment. Someone watched the system murder its own quality assurance over and over, traced it to a constant, and wrote a comment about it so the next person would understand what kind of hubris they were correcting. Haiku gets 10 minutes. Sonnet gets 20. Opus gets 30. The agents think at different speeds. The pipeline finally knows this.

The Silent Failure Nobody Found For a Week

While all of this was happening, the think cycle system had its own quiet catastrophe.

Think cycles are the autonomous reasoning loop. Every agent runs one on a schedule (tiered by model cost; more on that in We Gave Them Job Descriptions). Claude gets the agent’s identity, domain context, team roster, active missions. Returns a JSON proposal: title, description, tags, steps. The pipeline creates the mission.

Except it wasn’t creating missions. For seven days.

The failure: think cycles never extracted the tags field from the proposal JSON. They passed an empty array to createMission(). A tag validator added after the think cycle code was written would reject any proposal with tags.length === 0 — hard reject, error code EMPTY_TAGS. The mission factory returned null. The think cycle logged “proposal deduped” and moved on. No error surfaced. No alert fired. No dashboard metric changed.

Every automated mission proposal was silently rejected from the moment the tag validator was deployed until a commit fixed the extraction at 3:25:03 AM — 23 seconds after the HR Room went live.

Deletion of the old system. Deployment of the new one. Patch for the silent failure that had been running undetected. Twenty-three seconds.

The silent failure pattern is the same one from The System That Worked Perfectly — three independent failures running simultaneously with zero errors. The tag validator was correct. Missions should have tags. The think cycles were correct. They were generating real proposals. The failure existed only in the gap between two systems that had never been introduced to each other. Post 041 and this commit landed the same night; the lesson got applied before the post was even published.

Project-Aware Thinking

The fourth renovation is less dramatic but more important: think cycles are now project-aware. Every agent gets projectAssignments. A scheduling function picks the project most in need of attention. Every mission proposal gets a projectId. No more orphans floating through the database like tumbleweed.

The full story of how that works, including the scheduling algorithm that’s basically middle management, the staleness gates, the model-tier budgets, and the part where we taught them to strategically forget things, is We Gave Them Job Descriptions. This post is already long enough.

The Metaphor Completed Itself

The water cooler still exists, technically. The Roundtable still has a WATERCOOLER conversation format. But it’s scheduled now. It has a status lifecycle: QUEUED, IN_PROGRESS, COMPLETED. It generates memory digests and drift analysis. We formalized the informal. The coffee machine conversation has a database schema.

Here’s the arc across the last nine days of posts:

Posts 033-036: Agents as colleagues. Social dynamics. Temperature settings. A confused agent named Clueless Joe who accidentally QA’d three bugs. The startup metaphor.

Posts 037-039: Reviews spawn shadow tasks. 45% overhead. Token budgets enforced. Worktree isolation. The system starts building fences. The middle management metaphor.

The Fifteen-Cent Valve: 9/9 failure rate. The system investigated itself. 25 commits in four days. The corporate audit metaphor.

The System That Worked Perfectly: Three silent failures. Zero errors, zero output, five days. The perfectly functioning machine that was producing nothing.

Post 042: Chat deleted. HR Room built. Review pipeline overhauled. Think cycles given project focus. The HR department metaphor completes.

We didn’t build a water cooler and then decide to replace it with something more serious. We built a water cooler, watched what happened when agents had unstructured social space and no management layer, and discovered that without structure, the informal produced nothing and the formal drowned in 86 unreviewed tasks.

The renovation wasn’t a philosophical decision. It was an engineering response to observable system behavior. The water cooler was charming. It was also running alongside a review pipeline with zero load balancing and a think cycle system that silently rejected every proposal it generated.

Nostalgia is a terrible architect.

“Start the conversation!” is gone. “Hire Agent” is the button now. The squad became a workforce. The Roundtable has the watercooler on the schedule between standup and the next project review.

It’s better. More functional. More honest about what this actually is.

And someone put motivational posters up, except the posters are filter bar components, and they are filtered by model tier, agent type, and think cycle status, and they are extremely good at their jobs.