Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

5.2.2. Object Detection Projects

Object detection identifies and locates multiple objects within an image, returning bounding boxes and confidence scores.

Building on the capability spectrum from Section 1.1, Custom Vision represents the "customizable" level—you provide training data but don't need ML expertise.

💡 First Principle: Pre-built vision recognizes generic objects (cars, dogs, buildings), but it doesn't know your specific products, defects, or custom categories. Custom Vision bridges this gap: you provide 5-50 labeled examples of your categories, and it trains a model that recognizes your domain. The key insight: you need Custom Vision when pre-built returns generic labels ("metal object") but you need specific ones ("Widget Model A with defect type 3").

The project type determines what kind of output you'll receive. Choose based on whether you need locations (bounding boxes) or just labels.

Project Types:
TypeOutputExample
Classification (Multiclass)Single tag"Cat or dog?"
Classification (Multilabel)Multiple tags"What items are present?"
Object DetectionBounding boxes + tags"Where are defects?"
Training Requirements:
FactorMinimumRecommended
Images per tag550+
VarietyDifferent angles, lightingInclude negative examples
🔧 Implementation Reference: Custom Vision
ItemValue
Packagesazure-cognitiveservices-vision-customvision
ClassesCustomVisionTrainingClient, CustomVisionPredictionClient
Training Methodscreate_project(), create_tag(), create_images_from_data(), train_project()
Prediction Methodclassify_image(), detect_image()
Testable Pattern:
from azure.cognitiveservices.vision.customvision.training import CustomVisionTrainingClient
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials

# Training
training_credentials = ApiKeyCredentials(in_headers={"Training-key": training_key})
trainer = CustomVisionTrainingClient(training_endpoint, training_credentials)

# Prediction
prediction_credentials = ApiKeyCredentials(in_headers={"Prediction-key": prediction_key})
predictor = CustomVisionPredictionClient(prediction_endpoint, prediction_credentials)

# Classify an image
results = predictor.classify_image(project_id, publish_name, image_data)
for prediction in results.predictions:
    print(f"{prediction.tag_name}: {prediction.probability:.2%}")
Error Handling Pattern:
from azure.cognitiveservices.vision.customvision.training.models import CustomVisionErrorException
from msrest.exceptions import HttpOperationError

try:
    results = predictor.classify_image(project_id, publish_name, image_data)
except CustomVisionErrorException as e:
    if "NoPublishedModel" in str(e):
        # Model not published - publish iteration first
        logging.error("No published model. Publish an iteration before prediction.")
    elif "InvalidImageFormat" in str(e):
        logging.error("Unsupported image format. Use JPEG, PNG, GIF, or BMP.")
except HttpOperationError as e:
    if e.response.status_code == 429:
        # Rate limited
        time.sleep(int(e.response.headers.get("Retry-After", 60)))
CLI Equivalent (REST):
# Classify image
curl -X POST "https://{endpoint}/customvision/v3.0/Prediction/{project_id}/classify/iterations/{iteration_name}/image" \
  -H "Prediction-Key: {key}" \
  -H "Content-Type: application/octet-stream" \
  --data-binary @image.jpg

# Detect objects
curl -X POST "https://{endpoint}/customvision/v3.0/Prediction/{project_id}/detect/iterations/{iteration_name}/image" \
  -H "Prediction-Key: {key}" \
  -H "Content-Type: application/octet-stream" \
  --data-binary @image.jpg
Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications