2.1.1. Computer Vision Workloads
Computer Vision enables machines to interpret visual information from images and video. Recall from Section 1.2.1 that the INPUT is always an image—even when the output is text (OCR).
How Computer Vision works: Computer vision models analyze pixel patterns to understand image content. Deep learning models trained on millions of labeled images learn to recognize edges, shapes, textures, and ultimately objects and scenes.
The following table summarizes computer vision capabilities:
| Capability | What It Does | Output Type |
|---|---|---|
| Image Classification | Assigns a label to an entire image | Single category label |
| Object Detection | Locates and identifies multiple objects | Bounding boxes with labels |
| OCR | Extracts text from images | Text strings with positions |
| Facial Detection | Finds faces and analyzes attributes | Face locations + attributes |
| Semantic Segmentation | Classifies every pixel | Pixel-level category map |
Common scenarios: Product quality inspection, medical image analysis, security surveillance, document digitization, accessibility features, retail inventory tracking, autonomous vehicles.
Azure services for Computer Vision:
- Azure AI Vision: General-purpose image analysis, OCR, spatial analysis
- Azure AI Custom Vision: Train custom image classifiers
- Azure AI Face: Specialized face detection and analysis
⚠️ Exam Tip: Always classify AI workloads by INPUT type. If the input is an image, it's Computer Vision—even if the output is text (OCR) or structured data.