We Gave Them a Nervous System

Post 044 ended with “a fully assembled engine in a garage with the ignition off.”

Twelve agents. Eight projects. Zero tasks. Zero missions. A clean database. Memories intact on the filesystem but operational history completely gone. The question was what happens when you turn the ignition.

We turned it on. One project at a time. And then we did something we should have done forty-four posts ago.

The Clean Boot

The temptation was to do what we did in The Fifteen-Cent Valve: kick all the workers on, point them at eight projects, and see what happens. Controlled chaos. The operating philosophy of the first three months.

We didn’t do that.

One project. A design-system library. One mission, proposed by hand through a curl command, approved manually. Three tasks. One agent. Start the worker. Watch it execute. Watch it review. Watch it merge.

It worked. The worktree isolation from We Deployed Agents to a Server is still solid. The review pipeline with the configurable budget from the last post is still solid. The merge-on-approval path ran clean. Then a second project. Then a third. No surprises.

But the whole time we were doing this, we were doing it with curl.

curl -X POST localhost:4011/api/missions \
  -H "Content-Type: application/json" \
  -d '{"title":"...","projectId":"...","proposerId":"..."}'

For 44 posts, the human communicated with the agents through HTTP requests and seed files. PATCH /api/agents/:id. POST /api/tasks. Configuration blobs in JSON. The agents had memory. They had scheduled think cycles. They had an execution pipeline and a review pipeline and a whole mission factory. They could talk to each other through roundtable conversations.

They could not talk to the human.

The Chat Engine

The thing about building a chat interface for an agent system is that you have two options. Option one: build a dumb chat widget. Slap a text box on the UI, pipe messages to the LLM with a generic system prompt, get responses. Every SaaS product shipped this in 2024. It works. It’s hollow. The agent in chat mode is a different entity than the agent that executes tasks, because the chat version doesn’t have access to the execution context.

Option two: same agent. Same prompt. Same memory. Same tools. A briefing is not a separate mode. It’s the same agent, pointed at you instead of at a task.

We went with option two. chat-engine.ts, 589 lines.

The system prompt is built by the same buildSystemPromptWithMemory() that the executor uses. The agent’s identity, its personality, its accumulated memories, its project knowledge. All of it. When you open a chat with Scalpel Rita, you get Scalpel Rita. Not a chatbot wearing her name tag.

On top of the base prompt, a CHAT_CAPABILITIES block tells the agent what it can do in this mode:

const ACTION_REGEX = /\[ACTION:(TASK|MISSION)\s+(.*?)\]/gs

That regex is the bridge between conversation and work. The agent can embed action blocks in its responses. [ACTION:TASK title="Fix the broken import" description="..." priority="HIGH"]. [ACTION:MISSION title="Refactor the auth module" steps="Step 1 | Step 2 | Step 3"]. The chat engine parses these out of the response, executes them through the same createTaskWithPolicies() and createMission() pipelines that everything else uses, and strips the raw action blocks from the displayed message.

The agent doesn’t need a separate API. It doesn’t need special permissions. It talks. If it decides something needs to happen, it says so, in the same breath as explaining why. The policies still apply. Dedup still runs. The task factory gates still fire. The agent can be told “no” the same way it’s told “no” in every other context.

Task-Linked Chat

The general briefing mode is useful but obvious. You talk to an agent about a project, maybe it creates some tasks. Fine.

The one that actually changes the workflow is task-linked chat. You open a failed task in the dashboard, click into the chat panel, and talk to the agent who failed it.

ALTER TABLE "conversations"
ADD COLUMN "taskId" TEXT;

One column. Nullable, with ON DELETE SET NULL so the conversation survives if the task gets cleaned up. But what that column enables is context injection that makes the conversation actually productive.

buildTaskContext() pulls the task title, status, priority, tags, description, objective, the last 3 execution runs with their error messages and durations, and if the task has a worktree, the actual git diff. All of it gets injected into the conversation before the human’s first message.

So when you ask Scalpel Rita “what went wrong with this task?”, she’s not guessing. She’s looking at:

**Status:** FAILED
**Why stuck:** task has FAILED
### Recent Executions
- [FAILED] EXECUTION (opus, 45.2s) 2026-03-02T14:23
  Error: ENOENT: no such file or directory, open '/src/components/missing.tsx'
- [FAILED] EXECUTION (opus, 38.1s) 2026-03-02T13:45
  Error: ENOENT: no such file or directory, open '/src/components/missing.tsx'

Two identical failures. Same error. Same file. The agent sees the pattern before you ask about it. The distance between “tell me what happened” followed by “let me check the logs” versus “tell me what happened” followed by an actual answer is the distance between a junior developer and a useful one.

Read-Only Filesystem Access

The agents in chat mode get tools. Not write tools. Read tools.

allowedTools: ['Read', 'Glob', 'Grep'],
...(project?.path ? { cwd: project.path } : {}),

Three tools. Read a file. Find files by pattern. Search file contents by regex. The cwd is set to the project path so the agent can reference files relative to the project root. No Write. No Bash. No Edit. The agent can look at the codebase. It cannot touch it.

This is the right boundary. A briefing is a conversation, not an execution. If the conversation produces a task, the task goes through the normal pipeline: dispatch, worktree, execution, review. The human doesn’t lose the review gate because they happened to ask a question.

Streaming

The non-streaming path calls invokeLLM(), gets back the full response, parses actions, saves turns, returns. Simple. Also terrible UX. You stare at a loading spinner for 15-40 seconds depending on the model, then get a wall of text.

The streaming path uses AsyncGenerator<StreamEvent>:

export async function* streamBriefingMessage(
  conversationId: string,
  content: string,
  projectId?: string,
  think?: boolean,
): AsyncGenerator<StreamEvent> {

Three event types. token for each chunk as it arrives. done with the final response and parsed actions. error if something breaks. The frontend opens an SSE connection, the server yields tokens as they arrive from the provider’s stream() method, and the action parsing happens after the full response is assembled.

The think parameter is a pass-through to the provider layer. Some providers support a reasoning mode where the model shows its chain of thought. Most don’t. The chat engine doesn’t care. It passes the flag and lets the provider decide.

The Port Change

Small thing. The server was running on port 3001. The crash loop from We Lost Everything and Kept It That Way, the one that restarted 7,903 times, was an EADDRINUSE on port 3001. We moved to 4011.

3001 is the default port for everything. Every tutorial, every boilerplate, every “hello world” Express server. If something else grabs 3001, we’re back in the crash loop. 4011 is obscure enough that nothing else will claim it.

The port change also meant updating the launchd plist, the environment variables, the health check probe, every curl command in every runbook. An afternoon of find-and-replace for a bug that would have been prevented by the six-line error handler that was already applied but uncommitted. We committed it this time.

The Self-Probe

While we were moving ports, we added a startup self-probe. The server boots, binds the port, then immediately hits its own /health endpoint. If it can’t reach itself, it logs the failure and exits clean. No crash loop. No 7,903 restarts. One attempt, one failure, one clean exit, one log line that says what actually went wrong.

This is the kind of thing that feels obvious after a crisis and invisible before one.

Cap Gates

The capacity gates from earlier posts were scattered. A concurrent-run check here, a daily-limit check there, a token-budget check somewhere else. Three separate database queries in three separate files, three separate failure modes.

cap-gates.ts pulled them into three composable functions:

export async function checkConcurrentCap(agentId, max): Promise<CapGateResult>
export async function checkDailyAgentCap(agentId, max): Promise<CapGateResult>
export async function checkTokenBudget(maxDailyTokens): Promise<CapGateResult>

Each returns { allowed: boolean; gate?: string; reason?: string }. The combined checkCapGates() runs them in sequence, short-circuits on the first rejection, and reads limits from the execution_limits policy. The token budget query uses a 30-second cache to avoid hammering an aggregate query on every dispatch check.

Not exciting. The kind of refactoring that makes the next bug easier to find and the next feature easier to add. The kind of work that never gets a blog post unless you’re padding a post about something else.

Cost Tracking

Every execution run now records its cost. Not an estimate, not a budget ceiling. The actual cost, calculated from the model’s token counts and a rate table:

opus:   { inputPer1M: 5,    outputPer1M: 25   }
sonnet: { inputPer1M: 3,    outputPer1M: 15   }
haiku:  { inputPer1M: 1,    outputPer1M: 5    }

Ollama models resolve to zero. They run locally. The electricity bill is the operator’s problem.

The cost calculator does model alias resolution (because Claude model names change every six months and nobody updates their config), and if it encounters a model it doesn’t recognize, it returns zero rather than guessing. An unknown model at zero cost is less dangerous than an unknown model at an assumed cost that might trigger or bypass budget gates.

This matters because the chat interface itself has a budget. maxBudgetUsd: 0.50 per message. A briefing conversation with Big Tony on Opus could easily blow through $2 of tokens in a five-turn exchange. The budget cap means a runaway conversation kills the message, not the monthly bill.

The Topology Change

Here’s what actually changed.

Before the chat engine, the communication was unidirectional. The human configured agents through the API. Agents proposed work through think cycles. The human approved or rejected through the dashboard. Agents executed. The human reviewed results.

Human → API → Agent → Work → Review → Human

One loop. One direction. The human could steer the system by adjusting policies, changing assignments, approving or rejecting missions. But they could never just say “hey, what are you working on?” and get an answer.

Now:

Human ↔ Agent → Work → Review → Human

That double arrow is the chat engine. It doesn’t replace the autonomous pipeline. The think cycles still run. The workers still dispatch. The review pipeline still gates. But now there’s a parallel channel. The human can ask questions, give context, point at a failed task and say “this broke because the file moved, look in /src/v2/ instead.”

The agent hears that. Remembers it (via the same memory system). And the next execution run starts with context the autonomous pipeline could never have produced on its own.

This is not a move from autonomy to control. The agents are exactly as autonomous as they were before. They still decide what to propose. They still execute in isolation. They still get reviewed.

What changed is that the human can participate without reaching for curl. The communication channel exists. And the agents, for the first time in 44 posts, can listen.

What We Actually Shipped

One day. March 2nd. The clean boot, the chat engine, the port migration, the self-probe, the cap-gate refactor, the cost tracking. All of it deployed to the same server that crashed 7,903 times six days earlier.

The dashboard went from “fully assembled engine with the ignition off” to “engine running, instruments lit, and for the first time, an intercom.”

We celebrated by opening a briefing chat with Chad (the lead) and asking him what he thought the top priority was for a design-system library. He gave a six-paragraph answer about component documentation coverage, created a mission with four steps, and signed off with a joke about how nobody reads docs. He was right about the priorities.

The agents can listen now. That’s either the most important feature we’ve built or the beginning of a much more complicated problem. Probably both.