Distillation is when you use a large, expensive "teacher" model to generate training data, then fine-tune a smaller, cheaper "student" model on that data. The student doesn't match the teacher on everything, but on the narrow task it was distilled for it often comes within a few percentage points — at 10× lower cost and 5× lower latency.
Where it pays off: high-volume, narrow tasks (classification, routing, extraction, simple summarization). Where it doesn't: open-ended reasoning, anything you'd reach for a frontier model to do. Most teams should start with prompt engineering on a small model before considering distillation — it's a power tool, not a starter project.
Bring this to your business
Knowing the term is one thing. Shipping it is another.
We do two-week AI Sprints — one term, one workflow, into production by Day 10.