Challenge 24: Azure AI Vision - Image Analysis
Estimated Time
45 min | Cost: $1-2 (estimated) | Domain: Implement Computer Vision Solutions (10-15%)
Exam skills covered
- Select visual features to meet requirements
- Detect objects and generate tags in images
- Include or exclude visual features in analysis request
- Interpret image analysis responses including confidence scores
Overview
Azure AI Vision Image Analysis 4.0 provides a unified API for extracting visual information. Available features:
| Feature | Description |
|---|---|
caption | Natural language description of the image |
denseCaptions | Captions for multiple regions |
tags | Content tags with confidence scores |
objects | Object detection with bounding boxes |
people | People detection with bounding boxes |
read | OCR text extraction |
smartCrops | Optimal crop regions for thumbnails |
The API returns structured JSON with confidence scores (0.0–1.0) for each detected element.
Prerequisites
- Azure subscription
- Azure AI Services multi-service resource or Computer Vision resource
- Python 3.9+ or .NET 8
- Package:
azure-ai-vision-imageanalysis(v1.0+)
Implementation
Task 1: Create Azure AI Vision Resource
az group create --name rg-ai102-vision --location eastus2
az cognitiveservices account create \
--name ai-vision-ai102 \
--resource-group rg-ai102-vision \
--kind AIServices \
--sku S0 \
--location eastus2
# Get endpoint and key
ENDPOINT=$(az cognitiveservices account show --name ai-vision-ai102 --resource-group rg-ai102-vision --query properties.endpoint -o tsv)
KEY=$(az cognitiveservices account keys list --name ai-vision-ai102 --resource-group rg-ai102-vision --query key1 -o tsv)
echo "AZURE_AI_ENDPOINT=$ENDPOINT"
echo "AZURE_AI_KEY=$KEY"
Task 2: Analyze Image with Multiple Features
- Python SDK
- C# SDK
- REST API
import os
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential
client = ImageAnalysisClient(
endpoint=os.environ["AZURE_AI_ENDPOINT"],
credential=AzureKeyCredential(os.environ["AZURE_AI_KEY"])
)
# Analyze an image URL with multiple features
image_url = "https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png"
result = client.analyze_from_url(
image_url=image_url,
visual_features=[
VisualFeatures.CAPTION,
VisualFeatures.TAGS,
VisualFeatures.OBJECTS,
VisualFeatures.PEOPLE,
VisualFeatures.READ
],
language="en",
gender_neutral_caption=True
)
# Process caption
if result.caption:
print(f"Caption: '{result.caption.text}' (confidence: {result.caption.confidence:.4f})")
# Process tags
if result.tags:
print(f"\nTags ({len(result.tags.list)} found):")
for tag in result.tags.list:
print(f" - {tag.name}: {tag.confidence:.4f}")
# Process objects
if result.objects:
print(f"\nObjects ({len(result.objects.list)} detected):")
for obj in result.objects.list:
bbox = obj.bounding_box
print(f" - {obj.tags[0].name} ({obj.tags[0].confidence:.4f})")
print(f" Bounding box: x={bbox.x}, y={bbox.y}, w={bbox.width}, h={bbox.height}")
# Process people
if result.people:
print(f"\nPeople ({len(result.people.list)} detected):")
for person in result.people.list:
bbox = person.bounding_box
print(f" - Confidence: {person.confidence:.4f}")
print(f" Bounding box: x={bbox.x}, y={bbox.y}, w={bbox.width}, h={bbox.height}")
# Process OCR text
if result.read:
print(f"\nText (OCR):")
for block in result.read.blocks:
for line in block.lines:
print(f" '{line.text}'")
using Azure;
using Azure.AI.Vision.ImageAnalysis;
var endpoint = Environment.GetEnvironmentVariable("AZURE_AI_ENDPOINT");
var key = Environment.GetEnvironmentVariable("AZURE_AI_KEY");
var client = new ImageAnalysisClient(
new Uri(endpoint),
new AzureKeyCredential(key));
var imageUrl = new Uri("https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png");
var result = client.Analyze(
imageUrl,
VisualFeatures.Caption | VisualFeatures.Tags | VisualFeatures.Objects | VisualFeatures.People | VisualFeatures.Read,
new ImageAnalysisOptions { Language = "en", GenderNeutralCaption = true });
// Caption
Console.WriteLine($"Caption: '{result.Value.Caption.Text}' ({result.Value.Caption.Confidence:F4})");
// Tags
Console.WriteLine($"\nTags ({result.Value.Tags.Values.Count}):");
foreach (var tag in result.Value.Tags.Values)
Console.WriteLine($" - {tag.Name}: {tag.Confidence:F4}");
// Objects
Console.WriteLine($"\nObjects ({result.Value.Objects.Values.Count}):");
foreach (var obj in result.Value.Objects.Values)
{
var box = obj.BoundingBox;
Console.WriteLine($" - {obj.Tags[0].Name} ({obj.Tags[0].Confidence:F4})");
Console.WriteLine($" Box: x={box.X}, y={box.Y}, w={box.Width}, h={box.Height}");
}
// People
Console.WriteLine($"\nPeople ({result.Value.People.Values.Count}):");
foreach (var person in result.Value.People.Values)
Console.WriteLine($" - Confidence: {person.Confidence:F4}, Box: ({person.BoundingBox.X},{person.BoundingBox.Y})");
// OCR
Console.WriteLine("\nText:");
foreach (var block in result.Value.Read.Blocks)
foreach (var line in block.Lines)
Console.WriteLine($" '{line.Text}'");
ENDPOINT="https://<resource>.cognitiveservices.azure.com"
KEY="<your-key>"
curl -s "${ENDPOINT}/computervision/imageanalysis:analyze?features=caption,tags,objects,people,read&language=en&gender-neutral-caption=true&api-version=2024-02-01" \
-H "Ocp-Apim-Subscription-Key: ${KEY}" \
-H "Content-Type: application/json" \
-d '{
"url": "https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png"
}' | jq .
Task 3: Analyze Local Image with Smart Crops
- Python SDK
- REST API
# Analyze a local image file
with open("sample-image.jpg", "rb") as image_file:
image_data = image_file.read()
result = client.analyze(
image_data=image_data,
visual_features=[VisualFeatures.SMART_CROPS, VisualFeatures.DENSE_CAPTIONS],
smart_crops_aspect_ratios=[0.9, 1.33, 1.78] # Square-ish, 4:3, 16:9
)
# Smart crops for thumbnails
if result.smart_crops:
print("Smart crop regions:")
for crop in result.smart_crops.list:
bbox = crop.bounding_box
print(f" Aspect ratio {crop.aspect_ratio}: x={bbox.x}, y={bbox.y}, w={bbox.width}, h={bbox.height}")
# Dense captions - multiple region descriptions
if result.dense_captions:
print(f"\nDense captions ({len(result.dense_captions.list)}):")
for cap in result.dense_captions.list:
bbox = cap.bounding_box
print(f" '{cap.text}' (conf: {cap.confidence:.3f}) at ({bbox.x},{bbox.y})")
# Analyze local file with smart crops
curl -s "${ENDPOINT}/computervision/imageanalysis:analyze?features=smartCrops,denseCaptions&smartCrops-aspect-ratios=0.9,1.33,1.78&api-version=2024-02-01" \
-H "Ocp-Apim-Subscription-Key: ${KEY}" \
-H "Content-Type: application/octet-stream" \
--data-binary @sample-image.jpg | jq .
Expected Output
Caption: 'a person standing in front of a whiteboard giving a presentation' (confidence: 0.8523)
Tags (8 found):
- person: 0.9891
- indoor: 0.9754
- whiteboard: 0.9612
- presentation: 0.8934
- clothing: 0.8721
- standing: 0.8456
- wall: 0.7823
- text: 0.7234
Objects (2 detected):
- person (0.9234)
Bounding box: x=120, y=45, w=280, h=510
- whiteboard (0.8567)
Bounding box: x=420, y=30, w=350, h=400
People (1 detected):
- Confidence: 0.9456
Bounding box: x=118, y=42, w=285, h=515
Text (OCR):
'Azure AI Services'
'Computer Vision'
'Image Analysis'
Break & fix
| Scenario | Symptom | Root Cause | Fix |
|---|---|---|---|
| 415 Unsupported Media Type | Error on local file | Wrong Content-Type header | Use application/octet-stream for binary, application/json for URL |
| Empty tags/objects | No results returned | Image too small or blurry | Minimum 50x50 pixels; max 20MB |
InvalidImageUrl error | 400 Bad Request | URL not publicly accessible | Ensure image URL is publicly reachable; use local file upload instead |
| Low confidence scores | Results unreliable | Image quality or ambiguity | Filter results by confidence threshold (e.g., > 0.7) |
| Feature not available | FeatureNotSupported | Region doesn't support feature | Use supported regions (East US, West Europe, etc.) |
Knowledge Check
1. Which visual feature provides natural language descriptions for multiple regions within an image?
2. What does the smartCrops feature return?
3. How are confidence scores expressed in Image Analysis 4.0 responses?
4. What is the correct API endpoint format for Image Analysis 4.0?
5. Which parameter specifies what information to extract from the image?
Cleanup
az group delete --name rg-ai102-vision --yes --no-wait