Challenge 27: OCR - Extract Text from Images

Estimated Time

45 min | Cost: $1-3 (estimated) | Domain: Implement Computer Vision Solutions (10-15%)

Exam skills covered

Extract text from images using Azure AI Vision Read feature
Convert handwritten text to digital text
Process multi-page documents

Overview

Azure AI Vision's Read feature (part of Image Analysis 4.0) extracts printed and handwritten text from images and documents. The text hierarchy:

Image → Blocks → Lines → Words (with bounding polygons and confidence)

Key characteristics:

Supports 164+ languages for print, 9 languages for handwriting
Handles rotated, skewed, and noisy text
Returns bounding polygons for each text element
Synchronous API for single images

For multi-page PDFs, use the Azure AI Document Intelligence Read model instead.

Prerequisites

Azure subscription
Azure AI Services resource
Python 3.9+ or .NET 8
Package: azure-ai-vision-imageanalysis (v1.0+)

Implementation

Task 1: Extract Printed Text from Images

Python SDK
C# SDK
REST API

import os
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

client = ImageAnalysisClient(
    endpoint=os.environ["AZURE_AI_ENDPOINT"],
    credential=AzureKeyCredential(os.environ["AZURE_AI_KEY"])
)

# Extract text from an image URL
image_url = "https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png"

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.READ]
)

if result.read:
    print("Extracted Text:")
    print("-" * 40)
    for block in result.read.blocks:
        for line in block.lines:
            print(f"  Line: '{line.text}'")
            print(f"    Bounding polygon: {line.bounding_polygon}")
            
            # Access individual words with confidence
            for word in line.words:
                print(f"      Word: '{word.text}' (confidence: {word.confidence:.4f})")

using Azure;
using Azure.AI.Vision.ImageAnalysis;

var client = new ImageAnalysisClient(
    new Uri(Environment.GetEnvironmentVariable("AZURE_AI_ENDPOINT")),
    new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_AI_KEY")));

var imageUrl = new Uri("https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png");

var result = client.Analyze(imageUrl, VisualFeatures.Read);

Console.WriteLine("Extracted Text:");
foreach (var block in result.Value.Read.Blocks)
{
    foreach (var line in block.Lines)
    {
        Console.WriteLine($"  Line: '{line.Text}'");
        foreach (var word in line.Words)
        {
            Console.WriteLine($"    Word: '{word.Text}' (confidence: {word.Confidence:F4})");
        }
    }
}

ENDPOINT="https://<resource>.cognitiveservices.azure.com"
KEY="<your-key>"

curl -s "${ENDPOINT}/computervision/imageanalysis:analyze?features=read&api-version=2024-02-01" \
  -H "Ocp-Apim-Subscription-Key: ${KEY}" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png"}' \
  | jq '.readResult.blocks[].lines[] | {text: .text, confidence: .words[0].confidence}'

Task 2: Extract Handwritten Text

Python SDK
REST API

# Read handwritten text from a local image
with open("handwritten-note.jpg", "rb") as f:
    image_data = f.read()

result = client.analyze(
    image_data=image_data,
    visual_features=[VisualFeatures.READ]
)

if result.read:
    print("Handwritten Text Extracted:")
    for block in result.read.blocks:
        for line in block.lines:
            # Check confidence - handwriting often has lower confidence
            avg_confidence = sum(w.confidence for w in line.words) / len(line.words)
            confidence_indicator = "✓" if avg_confidence > 0.8 else "?"
            print(f"  [{confidence_indicator}] '{line.text}' (avg conf: {avg_confidence:.3f})")

# Send local image for OCR
curl -s "${ENDPOINT}/computervision/imageanalysis:analyze?features=read&api-version=2024-02-01" \
  -H "Ocp-Apim-Subscription-Key: ${KEY}" \
  -H "Content-Type: application/octet-stream" \
  --data-binary @handwritten-note.jpg | jq '.readResult'

Task 3: Process Multi-Page Document with Document Intelligence

Python SDK
REST API

from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
from azure.core.credentials import AzureKeyCredential

# For multi-page documents, use Document Intelligence Read model
doc_client = DocumentIntelligenceClient(
    endpoint=os.environ["AZURE_AI_ENDPOINT"],
    credential=AzureKeyCredential(os.environ["AZURE_AI_KEY"])
)

# Analyze a multi-page PDF
with open("multi-page-document.pdf", "rb") as f:
    poller = doc_client.begin_analyze_document(
        "prebuilt-read",
        body=f,
        content_type="application/pdf"
    )

result = poller.result()

print(f"Document contains {len(result.pages)} pages")
for page in result.pages:
    print(f"\n--- Page {page.page_number} ({page.width}x{page.height} {page.unit}) ---")
    for line in page.lines:
        print(f"  '{line.content}'")

# Access full content as continuous text
print(f"\nFull content:\n{result.content[:500]}")

# Document Intelligence Read API (async operation)
DOC_ENDPOINT="https://<resource>.cognitiveservices.azure.com"
KEY="<your-key>"

# Submit document for analysis
OPERATION_URL=$(curl -si "${DOC_ENDPOINT}/documentintelligence/documentModels/prebuilt-read:analyze?api-version=2024-11-30" \
  -H "Ocp-Apim-Subscription-Key: ${KEY}" \
  -H "Content-Type: application/pdf" \
  --data-binary @document.pdf | grep -i "operation-location" | tr -d '\r' | awk '{print $2}')

# Poll for results
sleep 5
curl -s "${OPERATION_URL}" \
  -H "Ocp-Apim-Subscription-Key: ${KEY}" | jq '.analyzeResult.pages[].lines[].content'

Expected Output

Extracted Text:
----------------------------------------
  Line: 'Azure AI Services'
    Bounding polygon: [{'x': 54, 'y': 28}, {'x': 403, 'y': 26}, ...]
      Word: 'Azure' (confidence: 0.9980)
      Word: 'AI' (confidence: 0.9950)
      Word: 'Services' (confidence: 0.9970)
  Line: 'Computer Vision'
      Word: 'Computer' (confidence: 0.9920)
      Word: 'Vision' (confidence: 0.9910)

Handwritten Text Extracted:
  [✓] 'Meeting notes - January 2024' (avg conf: 0.892)
  [?] 'discuss quarterly goals' (avg conf: 0.734)
  [✓] 'Action items below' (avg conf: 0.856)

Document contains 3 pages
--- Page 1 (8.5x11.0 inch) ---
  'Annual Report 2024'
  'Executive Summary'

Break & fix

Scenario	Symptom	Root Cause	Fix
No text detected	Empty results	Image too small or low quality	Min 50x50 px; ensure adequate resolution (300 DPI for print)
Wrong language detected	Garbled text	Auto-detection failed for rare scripts	Specify `language` parameter in request
Low word confidence	Uncertain results	Handwriting quality or unusual fonts	Accept lower thresholds for handwriting; preprocess image
413 Request Entity Too Large	File rejected	Image exceeds 20MB limit	Compress or resize image before submission
Bounding polygons incorrect	Misaligned boxes	Image rotation not detected	Use auto-rotation or preprocess to correct skew

Knowledge Check

1. What is the text hierarchy returned by the Azure AI Vision Read feature?

2. What is the maximum image size supported by the Image Analysis Read feature?

3. For multi-page PDF documents, which Azure service should you use for text extraction?

4. How does the API indicate uncertainty in handwritten text recognition?

5. What Content-Type header should you use when sending a local image file for OCR?

Cleanup

az group delete --name rg-ai102-vision --yes --no-wait

Exam skills covered​

Overview​

Prerequisites​

Implementation​

Task 1: Extract Printed Text from Images​

Task 2: Extract Handwritten Text​

Task 3: Process Multi-Page Document with Document Intelligence​

Expected Output​

Break & fix​

Knowledge Check​

Cleanup​

Learn More​