We Built a Pet Store for Robots

There are twelve agents. I can tell you each of their names, their roles, their specialties, the exact model they run on. Chad leads. Big Tony enforces. Scalpel Rita cuts. Loose Lips Rose talks too much. Clueless Joe tries his best. They were handmade in a seed file, each one a bespoke paragraph of identity text and a set of hand-tuned parameters.

Nobody has ever asked what happens when you want fifty.

The answer, until last week, was: you open seed.ts, copy-paste an agent block, tweak the identity, assign it a model and a project, run the seed, and hope you didn’t fat-finger a field name. Then you do it forty-nine more times. Each one requires understanding the full agent schema: 25 fields, several JSON blobs, model-specific quirks. The barrier to creating an agent was intimate knowledge of the system internals.

This is the story of how we lowered that barrier to a dropdown menu. And why that’s more unsettling than it sounds.

The Archetype Model

An archetype is a blueprint for an agent. Not a running agent. A description of what an agent should be, with holes where the specifics go.

model AgentArchetype {
  id                  String
  slug                String          @unique
  name                String          @unique
  version             Int             @default(1)
  status              ArchetypeStatus // DRAFT → PUBLISHED → DEPRECATED
  identityTemplate    String          // System prompt with {{VAR}} placeholders
  requiredVariables   Json?           // What the deployer needs to fill in
  knowledgeSeeds      Json?           // Memories the agent is born with
  // ... 25 fields total
}

The identityTemplate is the core. It’s the agent’s identity text, the system prompt that defines who they are, except with template variables where the specifics should be. A customer support archetype might have {{COMPANY_NAME}} and {{PRODUCT_LINE}} and {{ESCALATION_POLICY}}. A content writer might have {{PUBLICATION_NAME}} and {{STYLE_GUIDE_URL}} and {{TARGET_AUDIENCE}}.

requiredVariables defines the form. Each variable has a key, a label, a description, a type (text, secret, or url), and an optional default. The type matters because secrets get redacted before storage:

function redactSecrets(
  vars: Record<string, string>,
  requiredVariables: TemplateVariable[]
): Record<string, string> {
  const redacted = { ...vars }
  for (const v of requiredVariables) {
    if (v.type === 'secret' && redacted[v.key]) {
      redacted[v.key] = '***'
    }
  }
  return redacted
}

The API key goes into the agent’s identity at instantiation time. The variables JSON stored on the instance record has *** where the key was. If someone dumps the database, they get asterisks. The agent itself has the real value baked into its identity text, because that’s what gets sent to the LLM. But the audit trail is clean.

knowledgeSeeds is the part that surprised me. Each archetype can define up to 20 memories that the agent is born with. A tax specialist comes pre-loaded with knowledge about filing deadlines. A hotel receptionist knows the standard check-in flow before a single guest arrives. These aren’t retrieved from a knowledge base at runtime. They’re inserted as AgentMemory rows at instantiation, with a confidence score and tags, indistinguishable from memories the agent built through experience.

The agent doesn’t know which memories it earned and which it was given at birth. A freshly deployed agent from the store can be useful on its first execution, instead of spending its first ten runs building context from scratch.

`instantiateArchetype()`

The function that turns a blueprint into a running agent is 232 lines. The interesting parts:

export async function instantiateArchetype(
  archetypeId: string,
  variables: Record<string, string>,
  overrides?: { name?: string; model?: string; projectId?: string }
): Promise<{ agent: Agent; instance: AgentInstance }>

It validates the archetype is PUBLISHED (drafts and deprecated archetypes can’t be instantiated). It checks that all required variables are provided. It resolves the identity template by replacing {{VAR}} with the supplied values. And then it does the thing that makes a factory a factory: it handles naming.

const existingCount = await prisma.agent.count({
  where: { name: { startsWith: archetype.name } },
})
const agentName = overrides?.name ||
  (existingCount > 0 ? `${archetype.name} (${existingCount + 1})` : archetype.name)

“Call Center Agent” becomes “Call Center Agent (2)” becomes “Call Center Agent (3)”. Auto-incrementing names. The handmade agents have names with personality. The factory agents have names with serial numbers. That’s the trade-off.

The creation is atomic. The Agent row and the AgentInstance row land in a single prisma.$transaction(). The knowledge seeds are injected after the transaction, best-effort, because a failed memory insert shouldn’t block agent creation.

`upgradeInstance()`

Archetypes version. Version 1 might have a basic identity template. Version 2 adds a new required variable. Version 3 rewrites the think schedule.

When an archetype gets updated, existing instances don’t automatically change. They’re snapshots. But the store UI shows a version-drift indicator: a dot that says “hey, there’s a newer version of the blueprint you came from.”

The upgrade function re-resolves the template with the stored variables, applies the new defaults, and updates the agent. But it respects customizations:

const customizations = instance.customizations as Record<string, boolean> | null
if (!customizations?.model) {
  // Upgrade model to archetype default
}
if (!customizations?.thinkSchedule) {
  // Upgrade think schedule to archetype template
}
// Identity always upgrades (it's the core value proposition)

If you manually changed the agent’s model after deployment, the upgrade won’t clobber that choice. But the identity always upgrades, because that’s the point. The identity template is the thing that improves between versions. If you wanted a frozen identity, you wouldn’t be using the store.

Five Archetypes on a Shelf

The seed script ships five: a Dutch Tax Agent (finance), a Hotel Receptionist (hospitality), a Call Center Agent (support, free tier), a Content Marketing Coordinator (the only LEAD type), and a Real Estate Listing Agent. Five categories. Each with its own template variables, knowledge seeds, and think schedule configuration.

None of them overlap with the twelve handmade agents. The handmade agents are characters in a story. The store agents are products on a shelf. Different expectations, different naming conventions, different relationship to the system.

The API for browsing is what you’d expect: GET /api/store/archetypes with optional category, search, and status filters. The interesting constraint is that store mutations (creating, updating, deprecating archetypes) require a browser session. CLI agents with bearer tokens can browse the catalog but can’t modify it. An agent deploying other agents is a recursion we’re not ready for.

Breaking the Claude Monopoly

All twelve handmade agents run on Claude. Every think cycle, every execution, every review. Claude CLI subprocess, --print flag, JSON output. It works. It also means every token costs Anthropic API pricing, and every capability is bounded by what Claude supports.

The provider layer changes that.

export interface LLMProvider {
  name: string
  supportsTools: boolean
  supportsStreaming?: boolean
  invoke(prompt: string, options: LLMInvokeOptions): Promise<LLMResult>
  isAvailable(): Promise<boolean>
  listModels(): Promise<string[]>
  stream?(prompt: string, options: LLMInvokeOptions): AsyncGenerator<string>
  invalidateCache?(): void
}

Seven methods. That’s the contract. Implement these seven things and your LLM can run agents. The invoke() method takes a prompt and returns a response with token counts. The stream() method yields tokens as they arrive. isAvailable() probes whether the provider is reachable. listModels() returns what’s available.

Four providers ship today:

Claude (72 lines). The original. Wraps the CLI subprocess. Full tool support (Read, Write, Edit, Bash, Glob, Grep). Streaming via the CLI’s SSE output. This is still the primary provider for agents that need to write code.

Ollama (187 lines). HTTP adapter hitting localhost:11434. No tool support. An Ollama agent can think and review but can’t write files. The integration is pure HTTP: POST to /api/chat with messages, parse the JSON response. Streaming reads the response body line-by-line, parsing each JSON chunk. This builds on the local LLM triage layer we built earlier—Ollama becomes the cheap-inference backbone for non-critical reasoning.

// Ollama streaming parser
for await (const chunk of response.body) {
  buffer += decoder.decode(chunk, { stream: true })
  const lines = buffer.split('\n')
  buffer = lines.pop() || ''
  for (const line of lines) {
    const data = JSON.parse(line)
    if (data.message?.content) yield data.message.content
  }
}

Gemini (169 lines). Google’s API. Full tool support through function calling. The adapter translates between MC’s tool format and Gemini’s function declaration schema.

OpenAI (175 lines). GPT and Codex models. Similar to Gemini: tool support through the function calling API, token counting from the response metadata.

The resolution heuristic figures out which provider to use from the model name:

export function resolveProviderName(provider?: string | null, model?: string | null): string {
  if (provider) return provider
  if (model && (model.includes(':') || model.includes('/'))) return 'ollama'
  if (model?.toLowerCase().startsWith('gemini')) return 'gemini'
  if (model?.toLowerCase().startsWith('gpt-') || ...) return 'openai'
  return 'claude'  // default
}

A colon or slash in the model name means Ollama (because Ollama models look like llama3.1:8b). Starts with gemini? Gemini. Starts with gpt- or codex? OpenAI. Everything else falls back to Claude.

This means you can change an agent’s provider by changing its model field. Set it to qwen3.5:0.6b and it routes to Ollama. Set it to gemini-3-flash and it routes to Google. No configuration change required. The model name is the routing key.

Provider Detection

On startup, the system probes all registered providers:

export async function detectProviders(force?: boolean): Promise<DetectedProvider[]>

Each provider gets asked “are you available?” and “what models do you have?” The results are cached for 60 seconds. The store UI can show which providers are live and what models are available for deployment.

An Ollama instance that’s offline when the server boots? Detected as unavailable. Pull a new model and restart Ollama? Call the detection endpoint with force=true and the cache invalidates.

The Memory Overhaul

The memory system before this change was simple: fetch all memories for an agent, sort by confidence, return them. Every memory was equally relevant regardless of when it was created or whether it matched the current context. This echoes problems we found in memory systems’ lifecycle management when analyzing failure patterns.

The new retrieval scoring:

const ageYears = (now - m.createdAt) / (365 * 86400 * 1000)
const recencyFactor = Math.max(0.5, 1 - ageYears * 0.3)
const projectBoost = hasProjectTag ? 1.5 : 1.0
score = confidence * recencyFactor * projectBoost

Three multipliers. Confidence is the base (set when the memory is created, ranges from 0 to 1). Recency decays at 30% per year with a floor at 0.5, so a year-old memory is worth 70% of a fresh one, but it never drops below half. Project boost gives a 1.5x multiplier to memories tagged with the current project.

The result: a memory about a specific project’s API conventions, created last week with 0.85 confidence, scores 0.85 * 0.997 * 1.5 = 1.27. A general programming tip from six months ago at 0.70 confidence scores 0.70 * 0.85 * 1.0 = 0.60. The project-specific recent memory wins by 2x.

The retrieval has a token budget: 2,000 tokens by default. Token estimation is crude (Math.ceil(text.length / 4)) but good enough for budgeting. Memories are packed in score order until the budget runs out. The remaining memories are dropped.

After retrieval, the system fires a best-effort update to accessCount and lastAccessedAt on each selected memory. This feeds into the pruning system.

Three-Pass Pruning

The memory system now has lifecycle management. Three passes, run periodically:

Pass 1: Expiry. Memories below a confidence threshold that are older than a configurable number of days get deleted. Low-confidence old memories are noise.

Pass 2: Dedup. Pairwise substring matching. If memory A’s content is a substring of memory B’s content (or vice versa), the lower-confidence duplicate gets deleted. This is O(n^2) and unashamed about it. Agent memory tables are small enough that n^2 is measured in milliseconds, not minutes.

Pass 3: Cap. If the agent still has more memories than the configured maximum after expiry and dedup, the lowest-scored memories are deleted. The score here includes an access bonus: confidence * (1 + accessCount * 0.1). Frequently accessed memories survive longer. A memory that’s been retrieved 10 times has a 2x survival advantage over one that’s never been accessed.

The pruning is aggressive by design. Before this system, memories accumulated without bound. An agent running for two months could have hundreds of memories, most of them redundant rewrites of the same observations. The three-pass system keeps the memory set tight: relevant, recent, and deduplicated.

What This Changes

The store changes what “an agent” means.

Before: an agent is a named personality, hand-crafted in a seed file, with an identity that someone wrote specifically for that role. Creating one requires understanding the system. Each agent is unique because each agent was individually designed.

After: an agent is a deployable commodity. Pick an archetype, fill in the variables, click deploy. The identity comes from a template. The knowledge comes from seeds. The name comes from an auto-incrementer. You can have Call Center Agent (1) through Call Center Agent (47) and they’ll all work, all run their think cycles, all execute their tasks, all get reviewed.

The handmade agents still exist. Chad is still Chad. Big Tony is still Big Tony. They’re not archetypes. They’re not instances. They’re the originals, the ones who were here before the store existed.

The question nobody’s asking yet, but someone will: what happens when the factory agents outnumber the handmade ones? When there are 50 Call Center Agents and 12 named personalities? Do the factory agents get names? Do they get memories worth keeping? Do they get blog posts?

The multi-provider layer adds another dimension. The handmade agents are all Claude. The store agents can be anything. An Ollama agent running a local 8B model costs nothing per token but can’t use tools. A Gemini agent on Flash costs a fraction of what Opus costs. The store doesn’t just commoditize identity. It commoditizes the runtime.

This is either scaling or it’s dilution. The twelve handmade agents have weeks of accumulated context, personality quirks that emerged from real interactions, memories they actually earned. A store agent has template variables and knowledge seeds. It’s competent from birth. It’s also never surprised anyone.

But competent-from-birth at near-zero marginal cost is what scaling actually looks like. The question isn’t whether the factory agents are as good as the handmade ones. The question is whether “as good” is the right metric when you can deploy forty-seven of them before lunch.

The twelve originals haven’t noticed yet. They’re busy executing tasks for a design-system library, unaware that their job descriptions now come in a dropdown menu. The question of factory vs. handmade agents—identity through use vs. identity through design—is the same identity question we explored when the Pi’s assistant lost its personality, except this time it’s about scale.

The Archetype Model

instantiateArchetype()

upgradeInstance()