The Bouncer at the Door

Why we're putting a 0.5B parameter model in front of Claude Opus. Qwen 2.5 classifies greetings, status checks, and small talk for zero tokens. The dumbest model in the stack might save us the most money.