2.2.5. Custom AI Models and Small Language Models
When prebuilt capabilities don't meet requirements, architects must decide whether to use a general-purpose large language model (LLM) with customization or train a domain-specific small language model (SLM). This decision impacts cost, performance, accuracy, and infrastructure requirements.
Large Language Models (LLMs) — General-purpose models like GPT-4o, GPT-5, Claude, or Llama that handle a wide range of tasks. Available through Microsoft Foundry's model catalog. Best for:
- Tasks requiring broad knowledge and flexible reasoning
- Scenarios where prompt engineering provides sufficient customization
- Rapid prototyping and proof-of-concept development
Small Language Models (SLMs) — Domain-specific models with fewer parameters, optimized for narrow tasks. Microsoft's Phi family exemplifies this category. Best for:
- High-volume, narrow-domain tasks (classification, entity extraction, sentiment analysis)
- Latency-sensitive applications where response time is critical
- Cost-sensitive deployments where per-inference costs must stay low
- Edge or on-premises deployment where computational resources are limited
- Privacy-sensitive scenarios where data cannot leave the organization
| Dimension | LLM | SLM |
|---|---|---|
| Parameter count | Billions (100B+) | Millions to low billions (1B–14B) |
| Task breadth | Wide — handles diverse tasks | Narrow — optimized for specific tasks |
| Inference cost | Higher per request | Lower per request |
| Latency | Higher | Lower |
| Fine-tuning cost | Expensive, requires significant data | More affordable, less data needed |
| Deployment | Cloud-hosted (typically) | Cloud, edge, or on-premises |
| Accuracy (broad tasks) | Higher | Lower |
| Accuracy (narrow domain) | Good, but may overfit to general patterns | Can exceed LLM on specific domain tasks |
When to Create Custom Models:
The exam tests when custom model creation is justified versus using existing models with prompt engineering or RAG:
-
Prompt engineering + RAG — Try this first. If adding domain knowledge through grounding and crafting effective system prompts achieves acceptable accuracy, no custom model is needed. This is the fastest and cheapest path.
-
Fine-tuning an existing model — When prompt engineering hits a ceiling. Fine-tuning adjusts model weights using domain-specific training data. Use when the model needs to learn specialized vocabulary, formatting conventions, or domain-specific reasoning patterns.
-
Training from scratch — Rarely justified. Only consider when no existing model provides a suitable foundation, the domain is highly specialized (medical imaging, industrial quality inspection), and you have sufficient proprietary training data.
Exam Trap: Don't recommend custom model training when prompt engineering or fine-tuning would suffice. The exam rewards pragmatic architects who choose the simplest effective approach. Custom models introduce training data management, versioning, drift monitoring, and retraining overhead that may not be warranted.
Reflection Question: A legal firm wants AI that can analyze contracts and extract key clauses with high accuracy. They have 10,000 annotated contracts. Should they use an LLM with RAG, fine-tune an existing model, or train a custom SLM?