Model routing is the pattern of classifying each incoming request and sending it to the most appropriate (cheapest sufficient) model rather than every request to the same frontier model. Easy requests go to a fast cheap model; hard or sensitive ones go to the frontier model.
The economics are compelling: if you can route 70% of requests to GPT-4o-mini ($0.15/M input) instead of GPT-4o ($2.50/M input), you cut that 70% of your bill by 94%. Common routing signals: request complexity (question length, estimated reasoning required), content type (classification vs. open-ended generation), latency budget, and detected language.
Start with a simple keyword/length heuristic before training a classifier. Track the cost and quality impact of each routing decision. LiteLLM and other proxy layers have built-in routing support if you don't want to implement from scratch.
Bring this to your business
Knowing the term is one thing. Shipping it is another.
We do two-week AI Sprints — one term, one workflow, into production by Day 10.