Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

4.3.1. Custom Models in Microsoft Foundry

💡 First Principle: Custom models exist on a spectrum from "use as-is" to "build from scratch." The architect's job is to find the minimum customization that meets the requirement — because every step toward custom training adds cost, complexity, and maintenance burden.

The Model Customization Spectrum:
ApproachWhen to UseCostMaintenance
Use as-isGeneral tasks, standard capabilitiesLowestMinimal
Prompt engineeringStandard model with domain-specific instructionsLowPrompt versioning
RAGStandard model + organization's proprietary dataMediumData pipeline maintenance
Fine-tuningNeed model to consistently produce domain-specific outputsHighRetraining on data changes
Custom trainingNo existing model handles the taskHighestFull ML pipeline
Foundry Model Deployment Options:
Deployment TypeCharacteristicsBest For
Serverless API (MaaS)Pay-per-token, no GPU management, instant availabilityPrototyping, variable workloads, cost optimization
Managed computeDedicated infrastructure, predictable performanceProduction workloads with consistent demand, compliance requirements
Selecting Models from the Catalog:

The Foundry model catalog includes models from Microsoft (Phi, MAI), OpenAI (GPT series), Anthropic (Claude), Meta (Llama), Mistral, DeepSeek, and others. Selection criteria the architect evaluates:

  • Task fit — Does the model excel at the specific task (reasoning, code generation, multimodal, summarization)?
  • Context window — Can it handle the input size your use case requires?
  • Latency — Does inference speed meet user experience requirements?
  • Cost — Does per-token pricing fit the budget at expected volume?
  • Compliance — Does the model provider's data handling meet regulatory requirements?
  • Fine-tunability — If customization is needed, does the model support fine-tuning in Foundry?

⚠️ Exam Trap: The exam may present a scenario where "the best-performing model" is the obvious answer. But if that model costs 10x more per token and the task doesn't require frontier performance, the correct architectural recommendation is a smaller, more cost-effective model. Always evaluate model selection against both capability AND economics.

Reflection Question: A legal firm needs an AI solution that analyzes contracts, extracts key clauses, and flags deviations from standard templates. The contracts are proprietary and contain sensitive client information. Walk through the model customization spectrum — what's the minimum viable approach, and what data handling constraints affect the architecture?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications