The Boss in Your Pocket

We Built a CEO introduced Miles Carter as a chat interface. A row in the agents table, seventeen slash commands, and a voice layer that cost two cents per rewrite. Thirteen days later, the CEO module is fourteen files and three thousand three hundred thirty-five lines of code. The original six files and eight hundred lines are still in there. They’re just the skeleton now. The rest is everything the skeleton couldn’t do: remember what you talked about, decide what’s worth saying, know when to shut up, look at things before you ask, and handle a photo sent at 2am without losing its mind.

The CEO didn’t get smarter. The CEO got infrastructure.

The Digest That Nobody Needed

The first problem was noise. Three-tier notification routing works: IMMEDIATE for fires, DIGEST for batched updates, SILENT for things nobody needs to know. What that post didn’t cover is what happens when a DIGEST batch contains one low-signal event and nothing else.

The system dutifully collected one task completion, buffered it for five minutes, ran it through the voice layer, and sent a Telegram message that amounted to “one task completed, progress continues.” The operator read it, learned nothing, and lost a small amount of faith in the system.

assessDigest() is the fix. Before any digest ships, the system evaluates the batch:

if (events.length === 1 && lowSignalOnly) {
  return {
    shouldSend: false,
    reason: 'weak_singleton',
  }
}

if (lowSignalOnly && events.length < 3 && uniqueEntities.size < 2) {
  return {
    shouldSend: false,
    reason: 'thin_low_signal_batch',
  }
}

Two categories of silence. weak_singleton: one event, not interesting. thin_low_signal_batch: fewer than three events, all low-signal, fewer than two distinct entities. In both cases, the digest gets suppressed. The voice layer never fires. The operator’s phone stays quiet.

More engineering went into deciding what NOT to send than into sending it.

Four Layers of “Already Said That”

The original dedup was a timestamp check. If the event happened before the last flush, skip it. That worked until it didn’t. Overlapping flushes, server restarts mid-cycle, the same underlying event appearing as two different event types.

The current system has four layers.

Layer one: immediate dedup. IMMEDIATE notifications matching a recent send (same event type, same entity) get suppressed. Layer two: digest fingerprinting. Each batch gets a fingerprint from its event types and entity IDs. If the fingerprint matches the last sent digest, suppress. Layer three: ops scan string matching. The proactive monitoring system generates its own notifications. If the ops scan result matches something recently sent, suppress. This prevents the CEO from reporting a dispatch bottleneck he already reported three minutes ago because the proactive scan re-detected the same condition. Layer four: proactive stack guard. A three-minute minimum gap between proactive messages. Even if four subsystems all have something to say, they queue up.

Four layers. Because the first three weren’t enough.

The CEO Remembers What You Talked About

CeoScopeState is the part that changed the relationship.

Every conversation with the operator gets analyzed after it ends. The system extracts three things: decisions the operator made (“approve the security audit”), preferences the operator expressed (“don’t wake me up for task completions”), and open loops (“let me think about that and get back to you”).

This gets persisted. Next conversation, the continuity context builder pulls the last six conversations, their extracted decisions, and any unresolved open loops. The CEO starts the next conversation already knowing what happened in the previous ones.

The operator says “what happened with the thing we discussed yesterday?” and the CEO actually knows. Not because it has infinite context. Because someone built a 221-line state machine that extracts, persists, and re-injects conversational context on every session.

It’s the feature that makes the CEO feel less like a chat interface and more like a person you work with. Which is either impressive or unsettling, depending on how you feel about your tools having opinions about your preferences.

The CEO Looks Without Being Asked

Eight triggers. Hourly. The proactive ops scanner runs a diagnostic cycle and decides whether the operator needs to know something.

Dispatch drift: are tasks sitting in ASSIGNED longer than usual? Worker lag: is the tick loop falling behind? Budget headroom: are we approaching spend caps? Queue depth: is the review queue backing up? Agent health: is anyone in cooldown? Worktree congestion: are projects blocked on occupied worktrees? Failure rate: has the system-wide failure rate spiked? Staleness: are tasks approaching the 48-hour auto-fail threshold?

Eight conditions. If any cross a threshold, the CEO generates a proactive notification. Not because you asked. Because it looked.

The interactive mode guards prevent this from being annoying. The CEO won’t interrupt a live conversation (90-second grace after the last message). Won’t interrupt if a previous proactive message failed (5-minute cooldown). Won’t stack proactive messages closer than three minutes apart.

Four return false before one return true. The function defaults to silence and has to be argued into speaking.

The CEO Learned to See

Multimodal was the quiet addition. The operator sends a screenshot, a voice note, a document. The old system would choke. The new system detects the attachment type, generates an appropriate prompt, and processes it through the LLM with the attachment included.

Photos become context. Voice notes become transcripts. Documents become summaries. The CEO handles all of it in the same conversation flow, same personality, same voice.

Then there’s the web chat. SSE streaming, scoped per user, running the same engine that powers Telegram. Two faces for the same brain. The web interface exists for the moments when tapping on a phone isn’t enough and you need to see the full response render in real time on a proper screen.

What 3,333 Lines Means

Thirteen days. 2,535 new lines. Four major subsystems: quality gating, state persistence, proactive monitoring, multimodal processing. Each one solves a specific failure mode that the 800-line version had.

The 800-line version could talk. The 3,335-line version knows when to talk, what to say, what not to say, what it said last time, and what you told it to remember.

We Built a CEO gave the CEO a voice. Thirteen days later, the voice has opinions about when to use itself. That’s either growth or scope creep. The line count suggests both.