3.1.2. Choosing an Appropriate Model
💡 First Principle: Picking a model is matching the job's requirements to a model's capabilities along four axes: capability (can it do the task well?), modality (does it handle text only, or also images/audio?), cost, and latency. The best choice is the cheapest, fastest model that still meets the capability and modality the task demands.
A simple sentiment-tagging task doesn't need a frontier reasoning model — a small, cheap model handles it with lower cost and faster responses. A task that must reason over a chart needs a multimodal model that accepts images. Foundry's model catalog offers thousands of models precisely so you can match the job to the right one rather than forcing every task through one expensive model. This is the practical face of the "don't always pick the biggest" principle.
| Decision Factor | Ask Yourself | Example Impact |
|---|---|---|
| Capability | Is the task simple classification or complex reasoning? | Reasoning tasks need more capable (costlier) models |
| Modality | Text only, or also images/audio/video? | Visual input requires a multimodal model |
| Cost | How many tokens, how often? | High volume favors smaller, cheaper models |
| Latency | Does the user wait in real time? | Interactive apps favor faster models |
⚠️ Exam Trap: A scenario describing a simple, high-volume, latency-sensitive task whose "best" answer is a large frontier model is usually wrong. Match the model to the task's actual needs, not to raw capability.
Reflection Question: You need to classify thousands of product reviews as positive or negative, cheaply and fast. Which way do the four decision factors push your model choice?