Skip to main content

Challenge 12: Optical Character Recognition (OCR)

Estimated Time

25-35 min | Cost: Free | Domain: Computer Vision on Azure (15-20%)

Exam skills covered

  • Identify features of optical character recognition (OCR) solutions
  • Understand the difference between OCR and document intelligence
  • Identify Azure services for reading text from images
  • Describe capabilities of the Read API

Overview

Optical Character Recognition (OCR) is the technology that extracts text from images and documents. Any time you photograph a document, scan a receipt, or point your phone at a sign and it "reads" the text — that's OCR in action.

Think of OCR like teaching a computer to read. When you look at a photo of a restaurant menu, you instantly recognize letters and words. OCR does the same thing — it identifies the shapes of characters in an image and converts them into machine-readable text that applications can process, search, and store.

Azure provides OCR through two main services: Azure AI Vision (Read API) for general text extraction from images, and Azure AI Document Intelligence for structured document processing. The Read API handles printed and handwritten text from any image. Document Intelligence goes further — it understands document structure (fields, tables, key-value pairs) from specific document types like invoices, receipts, and forms.

Explore

Task 1: OCR vs Document Intelligence

FeatureAzure AI Vision (Read API)Azure AI Document Intelligence
What it extractsRaw text from imagesStructured fields, tables, and key-value pairs
InputAny image with textDocuments (invoices, receipts, forms, IDs)
OutputLines and words with positionsNamed fields (e.g., "InvoiceTotal: $1,234.56")
Use caseRead a sign, extract text from a screenshotProcess 10,000 invoices and extract totals, dates, vendors
AnalogyReading text out loudFilling in a spreadsheet from a form

Key distinction: OCR reads text character by character. Document Intelligence UNDERSTANDS document structure — it knows which number is the "total" and which is the "date."

Task 2: Try the Azure AI Vision OCR demo

  1. Visit Azure AI Vision demo
  2. Select the "Extract text from images" option
  3. Try with a sample image or upload your own (photo of a sign, document, or handwriting)
  4. Observe the results:
    • Text is extracted line by line
    • Each word has position coordinates (bounding polygon)
    • Both printed and handwritten text can be detected
    • The text is returned in reading order

Task 3: Understand the Read API response structure

The Read API returns a hierarchical structure:

Challenge 12 - OCR Read Result Structure

Key features of the Read API:

  • Handles printed and handwritten text
  • Supports multiple languages (120+ languages)
  • Works with rotated and skewed text
  • Processes multi-page documents (PDF, TIFF)
  • Returns confidence scores for each word

Task 4: Document Intelligence prebuilt models

Azure AI Document Intelligence offers prebuilt models for common document types:

Prebuilt modelWhat it extracts
InvoiceVendor name, invoice total, due date, line items
ReceiptMerchant, date, total, tax, items purchased
ID DocumentName, date of birth, document number, expiration
Business CardName, company, email, phone number
W-2 Tax formEmployee info, wages, taxes withheld
Health Insurance CardMember info, plan details, group number

Custom models: If your documents don't match prebuilt models, you can train Document Intelligence with your own document samples.

Azure CLI Alternative
# Analyze an image with the Read API
az cognitiveservices account show \
--name my-ai-services \
--resource-group my-rg \
--query "properties.endpoint"

# Document Intelligence is accessed via REST API:
# POST {endpoint}/documentintelligence/documentModels/prebuilt-invoice:analyze?api-version=2024-02-29

Key Concepts

ConceptDefinition
OCR (Optical Character Recognition)Technology that extracts text from images and scanned documents
Read APIAzure AI Vision capability that extracts printed and handwritten text
Azure AI Document IntelligenceService that extracts structured data (fields, tables) from documents
Bounding box/polygonCoordinates indicating where each word/line appears in the image
Printed textMachine-generated text (fonts) — higher accuracy
Handwritten textHuman-written text — more challenging, lower accuracy
Prebuilt modelPre-trained Document Intelligence model for specific document types
Custom modelUser-trained Document Intelligence model for unique document formats
Confidence scoreReliability measure (0-1) for each extracted word

Common Misconceptions

MisconceptionReality
"OCR and Document Intelligence are the same thing"OCR extracts raw text (characters and words). Document Intelligence understands document STRUCTURE — it knows which text is a date, which is a total, and which is a vendor name
"OCR only works with printed text"Azure's Read API handles both printed and handwritten text. Printed text typically has higher accuracy, but handwriting recognition has improved dramatically
"OCR requires perfectly clear, straight images"Modern OCR handles rotated, skewed, and even partially obscured text. The Read API compensates for imperfect image quality
"Document Intelligence requires custom training for every document type"Prebuilt models work immediately for common documents (invoices, receipts, IDs). Custom training is only needed for unique/proprietary document formats
"OCR gives you structured data directly"OCR gives you raw text in reading order. For structured data (key-value pairs, tables), you need Document Intelligence, which builds on OCR but adds document understanding

Knowledge Check

1. A company receives thousands of paper invoices and needs to automatically extract the vendor name, invoice date, and total amount into their accounting system. Which Azure service is most appropriate?

2. A developer needs to extract all text from photographs of street signs in multiple languages. Which Azure capability should they use?

3. What does the Read API return in addition to the extracted text?

4. Which of the following can the Azure AI Vision Read API handle?

5. What is the key difference between OCR (Read API) and Document Intelligence?

Learn More