5.2.2. Object Detection Projects
Object detection identifies and locates multiple objects within an image, returning bounding boxes and confidence scores.
Building on the capability spectrum from Section 1.1, Custom Vision represents the "customizable" level—you provide training data but don't need ML expertise.
💡 First Principle: Pre-built vision recognizes generic objects (cars, dogs, buildings), but it doesn't know your specific products, defects, or custom categories. Custom Vision bridges this gap: you provide 5-50 labeled examples of your categories, and it trains a model that recognizes your domain. The key insight: you need Custom Vision when pre-built returns generic labels ("metal object") but you need specific ones ("Widget Model A with defect type 3").
The project type determines what kind of output you'll receive. Choose based on whether you need locations (bounding boxes) or just labels.
Project Types:
| Type | Output | Example |
|---|---|---|
| Classification (Multiclass) | Single tag | "Cat or dog?" |
| Classification (Multilabel) | Multiple tags | "What items are present?" |
| Object Detection | Bounding boxes + tags | "Where are defects?" |
Training Requirements:
| Factor | Minimum | Recommended |
|---|---|---|
| Images per tag | 5 | 50+ |
| Variety | Different angles, lighting | Include negative examples |
🔧 Implementation Reference: Custom Vision
| Item | Value |
|---|---|
| Packages | azure-cognitiveservices-vision-customvision |
| Classes | CustomVisionTrainingClient, CustomVisionPredictionClient |
| Training Methods | create_project(), create_tag(), create_images_from_data(), train_project() |
| Prediction Method | classify_image(), detect_image() |
Testable Pattern:
from azure.cognitiveservices.vision.customvision.training import CustomVisionTrainingClient
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials
# Training
training_credentials = ApiKeyCredentials(in_headers={"Training-key": training_key})
trainer = CustomVisionTrainingClient(training_endpoint, training_credentials)
# Prediction
prediction_credentials = ApiKeyCredentials(in_headers={"Prediction-key": prediction_key})
predictor = CustomVisionPredictionClient(prediction_endpoint, prediction_credentials)
# Classify an image
results = predictor.classify_image(project_id, publish_name, image_data)
for prediction in results.predictions:
print(f"{prediction.tag_name}: {prediction.probability:.2%}")
Error Handling Pattern:
from azure.cognitiveservices.vision.customvision.training.models import CustomVisionErrorException
from msrest.exceptions import HttpOperationError
try:
results = predictor.classify_image(project_id, publish_name, image_data)
except CustomVisionErrorException as e:
if "NoPublishedModel" in str(e):
# Model not published - publish iteration first
logging.error("No published model. Publish an iteration before prediction.")
elif "InvalidImageFormat" in str(e):
logging.error("Unsupported image format. Use JPEG, PNG, GIF, or BMP.")
except HttpOperationError as e:
if e.response.status_code == 429:
# Rate limited
time.sleep(int(e.response.headers.get("Retry-After", 60)))
CLI Equivalent (REST):
# Classify image
curl -X POST "https://{endpoint}/customvision/v3.0/Prediction/{project_id}/classify/iterations/{iteration_name}/image" \
-H "Prediction-Key: {key}" \
-H "Content-Type: application/octet-stream" \
--data-binary @image.jpg
# Detect objects
curl -X POST "https://{endpoint}/customvision/v3.0/Prediction/{project_id}/detect/iterations/{iteration_name}/image" \
-H "Prediction-Key: {key}" \
-H "Content-Type: application/octet-stream" \
--data-binary @image.jpg