Skip to main content

Challenge 24: Azure AI Vision - Image Analysis

Estimated Time

45 min | Cost: $1-2 (estimated) | Domain: Implement Computer Vision Solutions (10-15%)

Exam skills covered

  • Select visual features to meet requirements
  • Detect objects and generate tags in images
  • Include or exclude visual features in analysis request
  • Interpret image analysis responses including confidence scores

Overview

Azure AI Vision Image Analysis 4.0 provides a unified API for extracting visual information. Available features:

FeatureDescription
captionNatural language description of the image
denseCaptionsCaptions for multiple regions
tagsContent tags with confidence scores
objectsObject detection with bounding boxes
peoplePeople detection with bounding boxes
readOCR text extraction
smartCropsOptimal crop regions for thumbnails

The API returns structured JSON with confidence scores (0.0–1.0) for each detected element.

Prerequisites

  • Azure subscription
  • Azure AI Services multi-service resource or Computer Vision resource
  • Python 3.9+ or .NET 8
  • Package: azure-ai-vision-imageanalysis (v1.0+)

Implementation

Task 1: Create Azure AI Vision Resource

az group create --name rg-ai102-vision --location eastus2

az cognitiveservices account create \
--name ai-vision-ai102 \
--resource-group rg-ai102-vision \
--kind AIServices \
--sku S0 \
--location eastus2

# Get endpoint and key
ENDPOINT=$(az cognitiveservices account show --name ai-vision-ai102 --resource-group rg-ai102-vision --query properties.endpoint -o tsv)
KEY=$(az cognitiveservices account keys list --name ai-vision-ai102 --resource-group rg-ai102-vision --query key1 -o tsv)

echo "AZURE_AI_ENDPOINT=$ENDPOINT"
echo "AZURE_AI_KEY=$KEY"

Task 2: Analyze Image with Multiple Features

import os
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

client = ImageAnalysisClient(
endpoint=os.environ["AZURE_AI_ENDPOINT"],
credential=AzureKeyCredential(os.environ["AZURE_AI_KEY"])
)

# Analyze an image URL with multiple features
image_url = "https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png"

result = client.analyze_from_url(
image_url=image_url,
visual_features=[
VisualFeatures.CAPTION,
VisualFeatures.TAGS,
VisualFeatures.OBJECTS,
VisualFeatures.PEOPLE,
VisualFeatures.READ
],
language="en",
gender_neutral_caption=True
)

# Process caption
if result.caption:
print(f"Caption: '{result.caption.text}' (confidence: {result.caption.confidence:.4f})")

# Process tags
if result.tags:
print(f"\nTags ({len(result.tags.list)} found):")
for tag in result.tags.list:
print(f" - {tag.name}: {tag.confidence:.4f}")

# Process objects
if result.objects:
print(f"\nObjects ({len(result.objects.list)} detected):")
for obj in result.objects.list:
bbox = obj.bounding_box
print(f" - {obj.tags[0].name} ({obj.tags[0].confidence:.4f})")
print(f" Bounding box: x={bbox.x}, y={bbox.y}, w={bbox.width}, h={bbox.height}")

# Process people
if result.people:
print(f"\nPeople ({len(result.people.list)} detected):")
for person in result.people.list:
bbox = person.bounding_box
print(f" - Confidence: {person.confidence:.4f}")
print(f" Bounding box: x={bbox.x}, y={bbox.y}, w={bbox.width}, h={bbox.height}")

# Process OCR text
if result.read:
print(f"\nText (OCR):")
for block in result.read.blocks:
for line in block.lines:
print(f" '{line.text}'")

Task 3: Analyze Local Image with Smart Crops

# Analyze a local image file
with open("sample-image.jpg", "rb") as image_file:
image_data = image_file.read()

result = client.analyze(
image_data=image_data,
visual_features=[VisualFeatures.SMART_CROPS, VisualFeatures.DENSE_CAPTIONS],
smart_crops_aspect_ratios=[0.9, 1.33, 1.78] # Square-ish, 4:3, 16:9
)

# Smart crops for thumbnails
if result.smart_crops:
print("Smart crop regions:")
for crop in result.smart_crops.list:
bbox = crop.bounding_box
print(f" Aspect ratio {crop.aspect_ratio}: x={bbox.x}, y={bbox.y}, w={bbox.width}, h={bbox.height}")

# Dense captions - multiple region descriptions
if result.dense_captions:
print(f"\nDense captions ({len(result.dense_captions.list)}):")
for cap in result.dense_captions.list:
bbox = cap.bounding_box
print(f" '{cap.text}' (conf: {cap.confidence:.3f}) at ({bbox.x},{bbox.y})")

Expected Output

Caption: 'a person standing in front of a whiteboard giving a presentation' (confidence: 0.8523)

Tags (8 found):
- person: 0.9891
- indoor: 0.9754
- whiteboard: 0.9612
- presentation: 0.8934
- clothing: 0.8721
- standing: 0.8456
- wall: 0.7823
- text: 0.7234

Objects (2 detected):
- person (0.9234)
Bounding box: x=120, y=45, w=280, h=510
- whiteboard (0.8567)
Bounding box: x=420, y=30, w=350, h=400

People (1 detected):
- Confidence: 0.9456
Bounding box: x=118, y=42, w=285, h=515

Text (OCR):
'Azure AI Services'
'Computer Vision'
'Image Analysis'

Break & fix

ScenarioSymptomRoot CauseFix
415 Unsupported Media TypeError on local fileWrong Content-Type headerUse application/octet-stream for binary, application/json for URL
Empty tags/objectsNo results returnedImage too small or blurryMinimum 50x50 pixels; max 20MB
InvalidImageUrl error400 Bad RequestURL not publicly accessibleEnsure image URL is publicly reachable; use local file upload instead
Low confidence scoresResults unreliableImage quality or ambiguityFilter results by confidence threshold (e.g., > 0.7)
Feature not availableFeatureNotSupportedRegion doesn't support featureUse supported regions (East US, West Europe, etc.)

Knowledge Check

1. Which visual feature provides natural language descriptions for multiple regions within an image?

2. What does the smartCrops feature return?

3. How are confidence scores expressed in Image Analysis 4.0 responses?

4. What is the correct API endpoint format for Image Analysis 4.0?

5. Which parameter specifies what information to extract from the image?

Cleanup

az group delete --name rg-ai102-vision --yes --no-wait

Learn More