4.3.1. Custom Models in Microsoft Foundry
💡 First Principle: Custom models exist on a spectrum from "use as-is" to "build from scratch." The architect's job is to find the minimum customization that meets the requirement — because every step toward custom training adds cost, complexity, and maintenance burden.
The Model Customization Spectrum:
| Approach | When to Use | Cost | Maintenance |
|---|---|---|---|
| Use as-is | General tasks, standard capabilities | Lowest | Minimal |
| Prompt engineering | Standard model with domain-specific instructions | Low | Prompt versioning |
| RAG | Standard model + organization's proprietary data | Medium | Data pipeline maintenance |
| Fine-tuning | Need model to consistently produce domain-specific outputs | High | Retraining on data changes |
| Custom training | No existing model handles the task | Highest | Full ML pipeline |
Foundry Model Deployment Options:
| Deployment Type | Characteristics | Best For |
|---|---|---|
| Serverless API (MaaS) | Pay-per-token, no GPU management, instant availability | Prototyping, variable workloads, cost optimization |
| Managed compute | Dedicated infrastructure, predictable performance | Production workloads with consistent demand, compliance requirements |
Selecting Models from the Catalog:
The Foundry model catalog includes models from Microsoft (Phi, MAI), OpenAI (GPT series), Anthropic (Claude), Meta (Llama), Mistral, DeepSeek, and others. Selection criteria the architect evaluates:
- Task fit — Does the model excel at the specific task (reasoning, code generation, multimodal, summarization)?
- Context window — Can it handle the input size your use case requires?
- Latency — Does inference speed meet user experience requirements?
- Cost — Does per-token pricing fit the budget at expected volume?
- Compliance — Does the model provider's data handling meet regulatory requirements?
- Fine-tunability — If customization is needed, does the model support fine-tuning in Foundry?
⚠️ Exam Trap: The exam may present a scenario where "the best-performing model" is the obvious answer. But if that model costs 10x more per token and the task doesn't require frontier performance, the correct architectural recommendation is a smaller, more cost-effective model. Always evaluate model selection against both capability AND economics.
Reflection Question: A legal firm needs an AI solution that analyzes contracts, extracts key clauses, and flags deviations from standard templates. The contracts are proprietary and contain sensitive client information. Walk through the model customization spectrum — what's the minimum viable approach, and what data handling constraints affect the architecture?