Seventh Time’s the Charm Isn’t a Thing was about agents proposing the same work sixteen times. The structural gap wasn’t the agents. It was the system: no way to tell agents what a project actually needs right now. No context beyond “this project exists and you’re assigned to it.”
For six months, the operator had one control: a boolean. workerEnabled: true or workerEnabled: false. Agents either worked or they didn’t. There was no vocabulary for “this project is 98% done, stop proposing features” or “we’re shipping next week, fix bugs only” or “the backend is solid, focus on the frontend.”
The operator approved or rejected proposals individually. One at a time. Every think cycle. Every mission. The entire governance model was one human reviewing one proposal from one agent, with no way to set context once and have it apply everywhere.
That changed this week.
The Shape of Context
interface ProjectGovernance {
phase: 'release' | 'ui' | 'backend' | 'stabilization' | 'maintenance' | null
focusAreas: string[]
deprioritizedAreas: string[]
shipBlockersOnly: boolean
autonomousWorkEnabled: boolean
operatorApprovalRequired: boolean
}
Six fields. One enum, two string arrays, three booleans. That’s the entire governance surface.
phase is the big one. Five modes, plus null for “no opinion.” Each mode adjusts how the think cycle scores this project when deciding where to focus:
release— large positive weight. The team is shipping. This project gets priority.stabilization— moderate negative weight. No new features. Bug fixes and polish only.maintenance— large negative weight. The project is done. Touch it only if something breaks.uiandbackend— directional signals. Tells the think cycle which surface area matters.
shipBlockersOnly is the emergency brake. When true, the project gets a positive score only if it has blocked or failed tasks that need resolution. If all that’s left is tail work, the project deprioritizes itself automatically. This is the difference between “focus on this project” and “focus on this project only if it’s on fire.”
autonomousWorkEnabled: false is the kill switch. Agents don’t propose for this project at all. Not stabilization, not maintenance. Just silence. The project exists in the system but generates zero autonomous work.
The operator sets this once. It propagates to every think cycle, every mission proposal, every dispatch decision. No per-proposal review required. The context is set. The agents adjust.
Where It Connects
The think cycle selects a focus project by scoring each assigned project. Before this week, the scoring was pure staleness: which project has gone longest without attention? Round-robin with a memory of when each project was last thought about.
Now the scoring includes governance weight. A project in release phase scores higher than a project in maintenance even if the maintenance project hasn’t been touched in days. Staleness still matters, but it’s no longer the only signal. A stale maintenance project stays stale. A fresh release project still gets priority.
focusAreas and deprioritizedAreas are string arrays injected into the think cycle context. Free-form text: “authentication,” “performance,” “API documentation.” The system doesn’t parse them. It passes them to the LLM as part of the project brief, and the LLM factors them into its proposal. Loose coupling. No enum. No validation. Just context.
This is the part that would bother a systems engineer and delight a product person. There’s no enforcement. An agent can ignore deprioritizedAreas and propose documentation when the operator said “no documentation.” The operator’s recourse is rejection, the same as before. But the proposal quality improves because the agent has context it didn’t have before, and LLMs are surprisingly good at following soft constraints when you give them.
Seven String Literals and a Policy Row
Separate from governance, but landing the same week: the permission mode classifier.
Every agent invocation in the codebase was hardcoded to --permission-mode default. Seven call sites. The executor, the review executor, the think cycle, the mission reviewer, the chat engine, the streaming chat engine, the voice layer. Seven copies of the same string literal.
export async function resolvePermissionMode(): Promise<'default' | 'auto'> {
const p = await getPolicy('permission_mode_policy')
if (!p || p.enabled === false) return 'default'
const mode = p.value?.mode
return mode === 'auto' ? 'auto' : 'default'
}
Changed to policy-driven. One function. One policy row. Seven call sites updated. Toggle without restart. Kill switch via enabled: false. Rollback is a single API call.
The risk assessment from the review session:
“Low risk. No behavior change without explicit opt-in. Default mode is identical to previous hardcoded behavior. Auto mode is additive. Rollback is a single API call.”
Nine files changed. Fifty-five lines. No schema migration. No new dependencies. The kind of change that makes you wonder why the string literal ever existed in seven places instead of one. The answer is obvious: it was fine when there was one provider, one mode, one way to invoke anything. Then the system grew, and the string literal didn’t.
Three More Things That Landed
Specialty-based dispatch scoring. Eight Gates and a Loop documented the scoring algorithm: reviews get +100, age adds +1 per hour. Now specialties add weight too. An SEO task dispatched to the SEO specialist scores higher than the same task dispatched to a generalist. The system has an opinion about who should do what, and the opinion is based on declared capability, not just availability.
Execution-to-learning loop. After each execution run, the agent’s memories are updated with a distillation of what happened. Before this week, agents completed work and moved on. The next execution started from scratch, no memory of the last one. Now the output of one run feeds the context of the next. The learning loop is closed. Not well. Not precisely. But closed.
Outcome scoreboard. Track task outcomes by agent, project, and specialty over time. Who’s completing tasks? Who’s failing? Which specialties have the highest success rate on which projects? Visibility into the question nobody was asking because nobody had the data to answer it.
What Autonomy Actually Needs
The temptation is to frame this as the operator taking control back from the agents. It’s not. The agents are still autonomous. They still propose missions. They still decide what to work on within a project. They still execute without supervision.
What changed is that autonomy has a shape now. Before: agents could do anything, anywhere, at any scale. The operator’s only recourse was per-proposal rejection, which is reactive, expensive, and doesn’t scale. After: the operator defines the shape. Release mode. Stabilization. Focus areas. Ship-blockers only. The agents operate inside that shape without being told what to do within it.
That’s not less autonomy. That’s the structure that makes autonomy useful. A team of engineers with no priorities is not autonomous. It’s chaotic. A team of engineers with clear priorities and the freedom to execute within them is autonomous. The difference is context, not control.
The Tasks That Were Never Born documented six tasks that never executed because the dispatch gates blocked them for 48 hours. Those tasks weren’t blocked by governance. They were blocked by a scoring tie and a serialization lock. Governance wouldn’t have saved them. But governance would have ensured that the tasks being proposed in the first place were the ones worth fighting over.
Six fields. One interface. The minimum viable vocabulary for telling a team what matters. It took six months to build it because it took six months to realize it was missing.
Comments