Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

2.2.5. Custom AI Models and Small Language Models

When prebuilt capabilities don't meet requirements, architects must decide whether to use a general-purpose large language model (LLM) with customization or train a domain-specific small language model (SLM). This decision impacts cost, performance, accuracy, and infrastructure requirements.

Large Language Models (LLMs) — General-purpose models like GPT-4o, GPT-5, Claude, or Llama that handle a wide range of tasks. Available through Microsoft Foundry's model catalog. Best for:

  • Tasks requiring broad knowledge and flexible reasoning
  • Scenarios where prompt engineering provides sufficient customization
  • Rapid prototyping and proof-of-concept development

Small Language Models (SLMs) — Domain-specific models with fewer parameters, optimized for narrow tasks. Microsoft's Phi family exemplifies this category. Best for:

  • High-volume, narrow-domain tasks (classification, entity extraction, sentiment analysis)
  • Latency-sensitive applications where response time is critical
  • Cost-sensitive deployments where per-inference costs must stay low
  • Edge or on-premises deployment where computational resources are limited
  • Privacy-sensitive scenarios where data cannot leave the organization
DimensionLLMSLM
Parameter countBillions (100B+)Millions to low billions (1B–14B)
Task breadthWide — handles diverse tasksNarrow — optimized for specific tasks
Inference costHigher per requestLower per request
LatencyHigherLower
Fine-tuning costExpensive, requires significant dataMore affordable, less data needed
DeploymentCloud-hosted (typically)Cloud, edge, or on-premises
Accuracy (broad tasks)HigherLower
Accuracy (narrow domain)Good, but may overfit to general patternsCan exceed LLM on specific domain tasks
When to Create Custom Models:

The exam tests when custom model creation is justified versus using existing models with prompt engineering or RAG:

  1. Prompt engineering + RAG — Try this first. If adding domain knowledge through grounding and crafting effective system prompts achieves acceptable accuracy, no custom model is needed. This is the fastest and cheapest path.

  2. Fine-tuning an existing model — When prompt engineering hits a ceiling. Fine-tuning adjusts model weights using domain-specific training data. Use when the model needs to learn specialized vocabulary, formatting conventions, or domain-specific reasoning patterns.

  3. Training from scratch — Rarely justified. Only consider when no existing model provides a suitable foundation, the domain is highly specialized (medical imaging, industrial quality inspection), and you have sufficient proprietary training data.

Exam Trap: Don't recommend custom model training when prompt engineering or fine-tuning would suffice. The exam rewards pragmatic architects who choose the simplest effective approach. Custom models introduce training data management, versioning, drift monitoring, and retraining overhead that may not be warranted.

Reflection Question: A legal firm wants AI that can analyze contracts and extract key clauses with high accuracy. They have 10,000 annotated contracts. Should they use an LLM with RAG, fine-tune an existing model, or train a custom SLM?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications