In March I wrote that Ollama had no tool support, Gemini had full function calling, and OpenAI was similar. We Built a Pet Store for Robots, for anyone keeping score. I was wrong on every count.

Ollama now has five hand-built tools, a 25-iteration agentic loop, and its own worktree isolation. Gemini has supportsTools: false and runs in sandbox mode. OpenAI is the same: text in, text out. The code evolved faster than the blog. That happens when you build in public. You publish a snapshot, ship twelve commits, and the snapshot becomes fiction.

So let’s correct the record.

The Seven-Method Contract

The whole multi-provider system rests on one interface:

export interface LLMProvider {
  name: string
  supportsTools: boolean
  supportsStreaming?: boolean
  invoke(prompt: string, options: LLMInvokeOptions): Promise<LLMResult>
  isAvailable(): Promise<boolean>
  listModels(): Promise<string[]>
  stream?(prompt: string, options: LLMInvokeOptions): AsyncGenerator<string>
}

Seven methods. One boolean that determines whether an agent can actually do anything beyond talking. Four providers register at startup. All four implement this interface. The interface says they’re equals. The implementations say otherwise.

Fifteen Lines of Routing

Before any of that matters, the system has to figure out which provider to use. That happens here:

export function resolveProviderName(provider?: string | null, model?: string | null): string {
  if (provider) return provider
  if (model && (model.includes(':') || model.includes('/'))) return 'ollama'
  if (!model || ['opus', 'sonnet', 'haiku'].includes(model.toLowerCase())) return 'claude'
  if (model.toLowerCase().startsWith('claude')) return 'claude'
  if (model.toLowerCase().startsWith('gemini')) return 'gemini'
  if (model.toLowerCase().startsWith('gpt-') || model.toLowerCase().startsWith('codex')
    || model.toLowerCase().startsWith('o1') || model.toLowerCase().startsWith('o3')
    || model.toLowerCase().startsWith('o4')) return 'openai'
  return 'claude'
}

The model name is the routing key. A colon means Ollama (qwen3:0.6b). “gemini-” means Google. “gpt-” means OpenAI. Everything else defaults to Claude. No config file. No database lookup. No routing service. Fifteen lines of string parsing. Elegant or reckless, depending on your pain threshold for implicit contracts.

Claude: The Full Stack

Claude is the native citizen. Worktree isolation is a single CLI flag: --worktree. Tools are built into the CLI. Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch, plus anything you expose through MCP servers. Path sandboxing is handled by the CLI itself. Token counts come back exact, costs come back in USD, and session markers get actively stripped from the subprocess environment to prevent Claude-spawning-Claude recursion.

The Claude adapter is the one everything else is measured against, because Claude is the one that was here first. The system was built for Claude. The other three are guests.

Ollama: The Rebuilt City

Ollama is where things get interesting. It went from triage classifier to full execution provider, and the journey required rebuilding everything Claude gets for free.

Worktree isolation? Claude does it with a flag. Ollama does it with ensureWorktreeExists(): validate the worktree name against a regex, check for path traversal, shell out to git worktree add, handle the “branch already exists” case. Same outcome. Forty more lines.

Path sandboxing? Claude’s CLI handles it. Ollama runs its tools inside Mission Control’s own process, so MC had to write resolveSafePath(): relative path traversal check, ancestor directory resolution for paths that don’t exist yet, and explicit symlink target verification. Thirty lines of security that Claude gets for zero.

Tools? Claude has a native toolkit. Ollama gets five hand-built replacements: read_file, write_file, list_directory, web_search, get_task_info. And then there’s the translation table.

The Rosetta Stone

This is the part that kept me up at night:

const CLAUDE_TO_OLLAMA_MAP: Record<string, string[]> = {
  Read: ['read_file'],
  Write: ['write_file'],
  Edit: ['write_file'],
  MultiEdit: ['write_file'],
  Glob: ['list_directory'],
  Grep: ['read_file', 'list_directory'],
  Bash: ['read_file', 'write_file', 'list_directory'],
  WebSearch: ['web_search'],
  WebFetch: ['web_search'],
}

Claude has nine-plus tools. Ollama has five. The mapping is lossy by design.

Edit and MultiEdit both collapse to write_file. Which means Ollama can’t do surgical string replacements. It overwrites the whole file. Grep becomes read_file plus list_directory, which means Ollama can’t regex search anything. It reads files and lists directories and hopes the LLM does the filtering. Bash, which implies arbitrary shell access, maps to all three filesystem tools. WebFetch, which fetches a specific URL and parses it, maps to web_search, which searches the web by keyword.

Every mapping is an honest approximation. None of them are exact equivalents. The system doesn’t lie about what Ollama can do. It translates, and translation always loses something.

The agentic loop compounds this. Ollama gets 25 tool iterations per invocation. Claude has no such cap. An Ollama agent working on a complex task might hit the ceiling mid-thought, its chain of tool calls truncated at an arbitrary boundary. Claude keeps going until it’s done or the budget runs out.

Gemini and Codex: Tourists

Gemini and OpenAI are the providers that showed up to the job site in business casual. supportsTools: false. No file writes. No worktree. No sandbox needed because there’s nothing to sandbox.

They can think. They can review code (text-only). They can run think cycles. But if the executor tries to assign one of them a coding task, this happens:

if (agentLLMProvider && !agentLLMProvider.supportsTools && worktreeFlag) {
  logger.warn({ taskId, agentId: agent.id },
    'Non-tool provider assigned to worktree task — bouncing to INBOX')
  return { runId: run.id, success: false,
    error: `Provider "${effectiveProviderName}" does not support tools` }
}

Bounced. Not “tried and failed.” Not “attempted with degraded capabilities.” The task goes back to INBOX without the provider touching it. The system hired a consultant who can’t use a keyboard and fired them before they sat down.

Their token counts are estimated at characters divided by four. Their cost tracking reports nothing. Their adapters are structural near-copies of each other: CLI subprocess, stdin prompt, stdout capture, timeout, kill.

The Trust Boundary You Didn’t See

Here’s the inversion nobody planned. Each CLI-based provider gets a scrubbed environment via buildProviderSubprocessEnv(). Only twenty base keys plus provider-specific auth tokens pass through. DATABASE_URL, session tokens, internal secrets: all stripped. Claude gets extra treatment where session markers are dropped to prevent recursion.

Ollama isn’t in that list. It doesn’t spawn subprocesses. It uses HTTP. So it doesn’t need env scrubbing.

But Ollama is also the provider whose tools run inside Mission Control’s own process. Claude’s tools run in a sandboxed subprocess. Gemini and OpenAI don’t have tools. The provider that operates in the most dangerous context is the one that doesn’t get environment isolation. The security models are inverted relative to the risk.

The Same Agent, Different Capabilities

The question that keeps nagging: when the same agent gets different capabilities depending on its provider, is it still the same agent?

Big Tony on Claude can read files, edit them surgically, run shell commands, search the web. Big Tony on Ollama can read files, overwrite them wholesale, list directories, search the web with keyword guesses. Big Tony on Gemini can talk. Same name, same identity text, same role description. Wildly different ability to do the job.

The LLMProvider interface says they’re all equivalent. The CLAUDE_TO_OLLAMA_MAP says they’re approximately equivalent. The supportsTools boolean says two of them aren’t equivalent at all. The abstraction holds if you squint. The moment you look at what actually happens behind invoke(), it falls apart.

That’s the thing about clean interfaces over messy reality. The interface is a promise. The implementation is the truth. And the distance between them is a translation table where Edit means “close enough.”