Skip to main content

Challenge 27: OCR - Extract Text from Images

Estimated Time

45 min | Cost: $1-3 (estimated) | Domain: Implement Computer Vision Solutions (10-15%)

Exam skills covered

  • Extract text from images using Azure AI Vision Read feature
  • Convert handwritten text to digital text
  • Process multi-page documents

Overview

Azure AI Vision's Read feature (part of Image Analysis 4.0) extracts printed and handwritten text from images and documents. The text hierarchy:

Image → Blocks → Lines → Words (with bounding polygons and confidence)

Key characteristics:

  • Supports 164+ languages for print, 9 languages for handwriting
  • Handles rotated, skewed, and noisy text
  • Returns bounding polygons for each text element
  • Synchronous API for single images

For multi-page PDFs, use the Azure AI Document Intelligence Read model instead.

Prerequisites

  • Azure subscription
  • Azure AI Services resource
  • Python 3.9+ or .NET 8
  • Package: azure-ai-vision-imageanalysis (v1.0+)

Implementation

Task 1: Extract Printed Text from Images

import os
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

client = ImageAnalysisClient(
endpoint=os.environ["AZURE_AI_ENDPOINT"],
credential=AzureKeyCredential(os.environ["AZURE_AI_KEY"])
)

# Extract text from an image URL
image_url = "https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png"

result = client.analyze_from_url(
image_url=image_url,
visual_features=[VisualFeatures.READ]
)

if result.read:
print("Extracted Text:")
print("-" * 40)
for block in result.read.blocks:
for line in block.lines:
print(f" Line: '{line.text}'")
print(f" Bounding polygon: {line.bounding_polygon}")

# Access individual words with confidence
for word in line.words:
print(f" Word: '{word.text}' (confidence: {word.confidence:.4f})")

Task 2: Extract Handwritten Text

# Read handwritten text from a local image
with open("handwritten-note.jpg", "rb") as f:
image_data = f.read()

result = client.analyze(
image_data=image_data,
visual_features=[VisualFeatures.READ]
)

if result.read:
print("Handwritten Text Extracted:")
for block in result.read.blocks:
for line in block.lines:
# Check confidence - handwriting often has lower confidence
avg_confidence = sum(w.confidence for w in line.words) / len(line.words)
confidence_indicator = "✓" if avg_confidence > 0.8 else "?"
print(f" [{confidence_indicator}] '{line.text}' (avg conf: {avg_confidence:.3f})")

Task 3: Process Multi-Page Document with Document Intelligence

from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
from azure.core.credentials import AzureKeyCredential

# For multi-page documents, use Document Intelligence Read model
doc_client = DocumentIntelligenceClient(
endpoint=os.environ["AZURE_AI_ENDPOINT"],
credential=AzureKeyCredential(os.environ["AZURE_AI_KEY"])
)

# Analyze a multi-page PDF
with open("multi-page-document.pdf", "rb") as f:
poller = doc_client.begin_analyze_document(
"prebuilt-read",
body=f,
content_type="application/pdf"
)

result = poller.result()

print(f"Document contains {len(result.pages)} pages")
for page in result.pages:
print(f"\n--- Page {page.page_number} ({page.width}x{page.height} {page.unit}) ---")
for line in page.lines:
print(f" '{line.content}'")

# Access full content as continuous text
print(f"\nFull content:\n{result.content[:500]}")

Expected Output

Extracted Text:
----------------------------------------
Line: 'Azure AI Services'
Bounding polygon: [{'x': 54, 'y': 28}, {'x': 403, 'y': 26}, ...]
Word: 'Azure' (confidence: 0.9980)
Word: 'AI' (confidence: 0.9950)
Word: 'Services' (confidence: 0.9970)
Line: 'Computer Vision'
Word: 'Computer' (confidence: 0.9920)
Word: 'Vision' (confidence: 0.9910)

Handwritten Text Extracted:
[✓] 'Meeting notes - January 2024' (avg conf: 0.892)
[?] 'discuss quarterly goals' (avg conf: 0.734)
[✓] 'Action items below' (avg conf: 0.856)

Document contains 3 pages
--- Page 1 (8.5x11.0 inch) ---
'Annual Report 2024'
'Executive Summary'

Break & fix

ScenarioSymptomRoot CauseFix
No text detectedEmpty resultsImage too small or low qualityMin 50x50 px; ensure adequate resolution (300 DPI for print)
Wrong language detectedGarbled textAuto-detection failed for rare scriptsSpecify language parameter in request
Low word confidenceUncertain resultsHandwriting quality or unusual fontsAccept lower thresholds for handwriting; preprocess image
413 Request Entity Too LargeFile rejectedImage exceeds 20MB limitCompress or resize image before submission
Bounding polygons incorrectMisaligned boxesImage rotation not detectedUse auto-rotation or preprocess to correct skew

Knowledge Check

1. What is the text hierarchy returned by the Azure AI Vision Read feature?

2. What is the maximum image size supported by the Image Analysis Read feature?

3. For multi-page PDF documents, which Azure service should you use for text extraction?

4. How does the API indicate uncertainty in handwritten text recognition?

5. What Content-Type header should you use when sending a local image file for OCR?

Cleanup

az group delete --name rg-ai102-vision --yes --no-wait

Learn More