Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

3.4. Reflection Checkpoint

Key Takeaways

  • Generative models predict tokens, one at a time, within a fixed context window. Pricing and limits are in tokens, not words, and prediction (not lookup) explains hallucination.
  • Choose models by capability, modality, cost, and latency — the smallest model that does the job, not the biggest available.
  • Deployment provisions capacity (pay-as-you-go vs. PTUs); parameters shape output. Temperature controls randomness, not intelligence; low temperature for factual tasks.
  • Prompts are configuration. System prompt = persistent rules; user prompt = the request. Grounding supplies trusted facts and is the main defense against hallucination.
  • Classify workloads by input/output: text analysis (keyword/entity/sentiment/summarization), speech (recognition vs. synthesis), vision (classification/detection/OCR vs. generation), information extraction, and agentic AI.

Connecting Forward

You now know what the tools are and how they behave. Phase 4 puts them to work: you'll deploy a model in the Microsoft Foundry portal, write effective system and user prompts, build a lightweight chat client with the Foundry SDK, and create your first single agent. Every configuration concept from this phase — tokens, temperature, system prompts, grounding, the agent workload — becomes something you actually do.

Self-Check Questions

  • Match each scenario to its workload: (a) transcribe a podcast; (b) pull the due date off a PDF invoice; (c) decide whether a tweet is positive; (d) generate a product image from a description; (e) an assistant that books a meeting by calling your calendar.
  • Explain why grounding plus a low temperature would make a factual Q&A feature more reliable than a high-temperature, ungrounded one.
Alvin Varughese
Written byAlvin Varughese
Founder18 professional certifications