Challenge 26: Custom Vision - Object Detection

Estimated Time

60 min | Cost: $2-5 (estimated) | Domain: Implement Computer Vision Solutions (10-15%)

Exam skills covered

Train custom image model for object detection
Label images with bounding box regions
Evaluate detection metrics (mAP)
Publish and consume object detection model

Overview

Object detection locates and classifies multiple objects within an image using bounding boxes. Unlike classification (which answers "what is this image?"), detection answers "what objects are here and where?"

Key concepts:

Bounding box: Rectangle defined by (left, top, width, height) as normalized coordinates (0.0–1.0)
IoU (Intersection over Union): Measures overlap between predicted and actual bounding boxes
mAP (mean Average Precision): Primary metric averaging AP across all object classes

Prerequisites

Azure subscription
Custom Vision Training + Prediction resources
Python 3.9+
Package: azure-cognitiveservices-vision-customvision

Implementation

Task 1: Create Object Detection Project

Python SDK

import os
import time
from azure.cognitiveservices.vision.customvision.training import CustomVisionTrainingClient
from azure.cognitiveservices.vision.customvision.training.models import (
    ImageUrlCreateEntry, Region
)
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials

training_key = os.environ["CUSTOM_VISION_TRAINING_KEY"]
training_endpoint = os.environ["CUSTOM_VISION_TRAINING_ENDPOINT"]

credentials = ApiKeyCredentials(in_headers={"Training-key": training_key})
trainer = CustomVisionTrainingClient(training_endpoint, credentials)

# Find the Object Detection domain
domains = trainer.get_domains()
obj_detection_domain = next(d for d in domains if d.type == "ObjectDetection" and not d.exportable)
print(f"Domain: {obj_detection_domain.name} ({obj_detection_domain.id})")

# Create object detection project
project = trainer.create_project(
    name="Vehicle-Detector",
    domain_id=obj_detection_domain.id
)
print(f"Created project: {project.name} ({project.id})")

# Create tags for objects to detect
car_tag = trainer.create_tag(project.id, "car")
truck_tag = trainer.create_tag(project.id, "truck")
bicycle_tag = trainer.create_tag(project.id, "bicycle")
print(f"Tags: car={car_tag.id}, truck={truck_tag.id}, bicycle={bicycle_tag.id}")

Task 2: Upload Images with Bounding Box Regions

Python SDK

# Regions use normalized coordinates (0.0 to 1.0 relative to image dimensions)
# Format: Region(tag_id, left, top, width, height)

training_images = [
    {
        "url": "https://example.com/traffic1.jpg",
        "regions": [
            Region(tag_id=car_tag.id, left=0.1, top=0.3, width=0.25, height=0.2),
            Region(tag_id=car_tag.id, left=0.5, top=0.35, width=0.2, height=0.18),
            Region(tag_id=truck_tag.id, left=0.7, top=0.2, width=0.28, height=0.3),
        ]
    },
    {
        "url": "https://example.com/traffic2.jpg",
        "regions": [
            Region(tag_id=bicycle_tag.id, left=0.05, top=0.4, width=0.15, height=0.25),
            Region(tag_id=car_tag.id, left=0.4, top=0.3, width=0.3, height=0.22),
        ]
    }
]

# Upload images with regions
image_entries = []
for img in training_images:
    entry = ImageUrlCreateEntry(
        url=img["url"],
        regions=img["regions"]
    )
    image_entries.append(entry)

upload_result = trainer.create_images_from_urls(
    project.id,
    images=image_entries
)
print(f"Upload success: {upload_result.is_batch_successful}")
for image in upload_result.images:
    print(f"  {image.source_url}: {image.status}")

Task 3: Train and Evaluate Object Detection Model

Python SDK

# Train the model
print("Training object detection model...")
iteration = trainer.train_project(project.id)

while iteration.status != "Completed":
    iteration = trainer.get_iteration(project.id, iteration.id)
    print(f"  Status: {iteration.status}")
    time.sleep(10)

print(f"Training complete: {iteration.id}")

# Evaluate performance
performance = trainer.get_iteration_performance(project.id, iteration.id)
print(f"\nDetection Metrics:")
print(f"  Precision: {performance.precision:.4f}")
print(f"  Recall: {performance.recall:.4f}")
print(f"  mAP: {performance.average_precision:.4f}")

for tag_perf in performance.per_tag_performance:
    print(f"  '{tag_perf.name}': precision={tag_perf.precision:.3f}, recall={tag_perf.recall:.3f}, AP={tag_perf.average_precision:.3f}")

# Publish
prediction_resource_id = "/subscriptions/<sub-id>/resourceGroups/rg-ai102-customvision/providers/Microsoft.CognitiveServices/accounts/cv-prediction-ai102"
publish_name = "vehicle-detector-v1"

trainer.publish_iteration(project.id, iteration.id, publish_name, prediction_resource_id)
print(f"\nPublished as: {publish_name}")

Task 4: Run Object Detection Predictions

Python SDK
REST API

prediction_key = os.environ["CUSTOM_VISION_PREDICTION_KEY"]
prediction_endpoint = os.environ["CUSTOM_VISION_PREDICTION_ENDPOINT"]

pred_credentials = ApiKeyCredentials(in_headers={"Prediction-key": prediction_key})
predictor = CustomVisionPredictionClient(prediction_endpoint, pred_credentials)

# Detect objects in a new image
test_url = "https://example.com/street-scene.jpg"
results = predictor.detect_image_url(project.id, publish_name, url=test_url)

print(f"\nDetection Results:")
print(f"Objects found: {len(results.predictions)}")

for detection in results.predictions:
    if detection.probability > 0.5:  # Confidence threshold
        bbox = detection.bounding_box
        print(f"  {detection.tag_name} ({detection.probability:.1%})")
        print(f"    Box: left={bbox.left:.3f}, top={bbox.top:.3f}, "
              f"width={bbox.width:.3f}, height={bbox.height:.3f}")

# Convert normalized to pixel coordinates (for a 1920x1080 image)
image_width, image_height = 1920, 1080
for detection in results.predictions:
    if detection.probability > 0.5:
        bbox = detection.bounding_box
        pixel_left = int(bbox.left * image_width)
        pixel_top = int(bbox.top * image_height)
        pixel_width = int(bbox.width * image_width)
        pixel_height = int(bbox.height * image_height)
        print(f"  {detection.tag_name}: ({pixel_left}, {pixel_top}) -> ({pixel_left+pixel_width}, {pixel_top+pixel_height})")

PREDICTION_ENDPOINT="https://<resource>.cognitiveservices.azure.com"
PREDICTION_KEY="<key>"
PROJECT_ID="<project-id>"

curl -s "${PREDICTION_ENDPOINT}/customvision/v3.0/prediction/${PROJECT_ID}/detect/iterations/vehicle-detector-v1/url" \
  -H "Prediction-Key: ${PREDICTION_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/street-scene.jpg"}' \
  | jq '.predictions[] | select(.probability > 0.5) | {tag: .tagName, probability: .probability, boundingBox}'

Expected Output

Domain: General (Object Detection)
Created project: Vehicle-Detector
Tags: car=..., truck=..., bicycle=...
Upload success: True

Training object detection model...
  Status: Training
  Status: Completed
Training complete: iter-67890

Detection Metrics:
  Precision: 0.8850
  Recall: 0.8200
  mAP: 0.8734
  'car': precision=0.920, recall=0.880, AP=0.910
  'truck': precision=0.870, recall=0.790, AP=0.850
  'bicycle': precision=0.865, recall=0.770, AP=0.860

Published as: vehicle-detector-v1

Detection Results:
Objects found: 4
  car (95.2%)
    Box: left=0.102, top=0.298, width=0.245, height=0.198
  car (87.3%)
    Box: left=0.510, top=0.320, width=0.190, height=0.175
  truck (82.1%)
    Box: left=0.720, top=0.180, width=0.260, height=0.310

Break & fix

Scenario	Symptom	Root Cause	Fix
Regions rejected	Invalid region coordinates	Coordinates outside 0.0–1.0 range	Normalize: left+width ≤ 1.0, top+height ≤ 1.0
Low mAP	Poor detection accuracy	Inconsistent bounding box labeling	Re-label with tight, consistent boxes; more training data
Overlapping detections	Duplicate predictions	No NMS threshold configured	Apply confidence threshold; use Non-Maximum Suppression
Training fails	`BadRequestImageRegions`	Regions too small or missing	Minimum region size ~5% of image area
Wrong endpoint	404 on detection	Using classify endpoint for detection	Use `/detect/` not `/classify/` in prediction URL

Knowledge Check

1. How are bounding box coordinates represented in Custom Vision object detection?

2. What does mAP (mean Average Precision) measure in object detection?

3. What is the key difference between the classify and detect prediction endpoints?

4. What is IoU (Intersection over Union) used for?

5. When labeling training images for object detection, what coordinates do you need for each object?

Cleanup

az group delete --name rg-ai102-customvision --yes --no-wait

Exam skills covered​

Overview​

Prerequisites​

Implementation​

Task 1: Create Object Detection Project​

Task 2: Upload Images with Bounding Box Regions​

Task 3: Train and Evaluate Object Detection Model​

Task 4: Run Object Detection Predictions​

Expected Output​

Break & fix​

Knowledge Check​

Cleanup​

Learn More​