Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

5.1.3. Object and People Detection

Detect and locate objects and people within images. Returns bounding box coordinates for each detection.

💡 First Principle: Humans glance at an image and instantly know "that's a dog in a park." Computer vision replicates this by converting pixels into semantic labels. But different questions require different analyses: "What's in this image?" → tags/objects. "Describe this image" → captions. "What text is visible?" → OCR. "Where is the dog?" → object detection with bounding boxes. The exam tests whether you can match the question type to the correct visual feature parameter.

đź”§ Implementation Reference: Azure AI Vision
ItemValue
Packageazure-ai-vision-imageanalysis
ClassImageAnalysisClient
Methodsanalyze(), analyze_from_url()
HeaderOcp-Apim-Subscription-Key
EndpointPOST /computervision/imageanalysis:analyze?features={features}

Visual features determine what the API extracts from an image. Select the features relevant to your scenario.

Visual Features:
FeatureOutput
captionSingle description
denseCaptionsMultiple region descriptions
tagsContent labels
objectsDetected objects with bounding boxes
readExtracted text (OCR)
peoplePerson detection
smartCropsSuggested crop regions

The following diagram helps you select the right visual features for your scenario:

Testable Pattern:
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures

client = ImageAnalysisClient(endpoint=endpoint, credential=AzureKeyCredential(key))
result = client.analyze(
    image_data=image_bytes,
    visual_features=[VisualFeatures.CAPTION, VisualFeatures.TAGS, VisualFeatures.OBJECTS, VisualFeatures.READ]
)
caption = result.caption.text
tags = [tag.name for tag in result.tags.values]
Error Handling Pattern:
from azure.core.exceptions import HttpResponseError

try:
    result = client.analyze(image_data=image_bytes, visual_features=[VisualFeatures.READ])
except HttpResponseError as e:
    if e.status_code == 400:
        # Invalid image format or size
        logging.error("Invalid image")
    elif e.status_code == 415:
        # Unsupported media type
        logging.error("Unsupported format")
CLI Equivalent (REST):
curl -X POST "https://{endpoint}/computervision/imageanalysis:analyze?features=caption,tags&api-version=2024-02-01" \
  -H "Ocp-Apim-Subscription-Key: {key}" \
  -H "Content-Type: application/octet-stream" \
  --data-binary @image.jpg

⚠️ Exam Trap: OCR (read feature) is Computer Vision, not Language—input is image pixels.

Azure AI Vision Documentation

Alvin Varughese
Written byAlvin Varughese
Founder•15 professional certifications