5.1.3. Object and People Detection
Detect and locate objects and people within images. Returns bounding box coordinates for each detection.
💡 First Principle: Humans glance at an image and instantly know "that's a dog in a park." Computer vision replicates this by converting pixels into semantic labels. But different questions require different analyses: "What's in this image?" → tags/objects. "Describe this image" → captions. "What text is visible?" → OCR. "Where is the dog?" → object detection with bounding boxes. The exam tests whether you can match the question type to the correct visual feature parameter.
đź”§ Implementation Reference: Azure AI Vision
| Item | Value |
|---|---|
| Package | azure-ai-vision-imageanalysis |
| Class | ImageAnalysisClient |
| Methods | analyze(), analyze_from_url() |
| Header | Ocp-Apim-Subscription-Key |
| Endpoint | POST /computervision/imageanalysis:analyze?features={features} |
Visual features determine what the API extracts from an image. Select the features relevant to your scenario.
Visual Features:
| Feature | Output |
|---|---|
caption | Single description |
denseCaptions | Multiple region descriptions |
tags | Content labels |
objects | Detected objects with bounding boxes |
read | Extracted text (OCR) |
people | Person detection |
smartCrops | Suggested crop regions |
The following diagram helps you select the right visual features for your scenario:
Testable Pattern:
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
client = ImageAnalysisClient(endpoint=endpoint, credential=AzureKeyCredential(key))
result = client.analyze(
image_data=image_bytes,
visual_features=[VisualFeatures.CAPTION, VisualFeatures.TAGS, VisualFeatures.OBJECTS, VisualFeatures.READ]
)
caption = result.caption.text
tags = [tag.name for tag in result.tags.values]
Error Handling Pattern:
from azure.core.exceptions import HttpResponseError
try:
result = client.analyze(image_data=image_bytes, visual_features=[VisualFeatures.READ])
except HttpResponseError as e:
if e.status_code == 400:
# Invalid image format or size
logging.error("Invalid image")
elif e.status_code == 415:
# Unsupported media type
logging.error("Unsupported format")
CLI Equivalent (REST):
curl -X POST "https://{endpoint}/computervision/imageanalysis:analyze?features=caption,tags&api-version=2024-02-01" \
-H "Ocp-Apim-Subscription-Key: {key}" \
-H "Content-Type: application/octet-stream" \
--data-binary @image.jpg
⚠️ Exam Trap: OCR (read feature) is Computer Vision, not Language—input is image pixels.