Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

2.1.1. Computer Vision Workloads

Computer Vision enables machines to interpret visual information from images and video. Recall from Section 1.2.1 that the INPUT is always an image—even when the output is text (OCR).

How Computer Vision works: Computer vision models analyze pixel patterns to understand image content. Deep learning models trained on millions of labeled images learn to recognize edges, shapes, textures, and ultimately objects and scenes.

The following table summarizes computer vision capabilities:

CapabilityWhat It DoesOutput Type
Image ClassificationAssigns a label to an entire imageSingle category label
Object DetectionLocates and identifies multiple objectsBounding boxes with labels
OCRExtracts text from imagesText strings with positions
Facial DetectionFinds faces and analyzes attributesFace locations + attributes
Semantic SegmentationClassifies every pixelPixel-level category map

Common scenarios: Product quality inspection, medical image analysis, security surveillance, document digitization, accessibility features, retail inventory tracking, autonomous vehicles.

Azure services for Computer Vision:
  • Azure AI Vision: General-purpose image analysis, OCR, spatial analysis
  • Azure AI Custom Vision: Train custom image classifiers
  • Azure AI Face: Specialized face detection and analysis

⚠️ Exam Tip: Always classify AI workloads by INPUT type. If the input is an image, it's Computer Vision—even if the output is text (OCR) or structured data.

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications