Skip to main content

Challenge 45: Azure Document Intelligence — Prebuilt Models

Estimated Time

45-60 min | Cost: ~$1.00 (Document Intelligence S0 tier + transactions) | Domain: Knowledge Mining & Extraction (15-20%)

Exam skills covered

SkillWeight
Provision Azure AI Document IntelligenceHigh
Use prebuilt models to extract data from documentsHigh
Select the appropriate prebuilt model for a scenarioHigh
Handle confidence scores and extracted fieldsMedium
Use the layout model for structure extractionMedium

Overview

Azure AI Document Intelligence (formerly Form Recognizer) uses machine learning to extract structured data from documents. Prebuilt models are pre-trained for common document types:

ModelUse caseKey fields extracted
prebuilt-invoiceInvoicesVendorName, InvoiceTotal, DueDate, LineItems
prebuilt-receiptReceiptsMerchantName, Total, TransactionDate, Items
prebuilt-idDocumentIDs/PassportsFirstName, LastName, DateOfBirth, DocumentNumber
prebuilt-businessCardBusiness cardsContactNames, Emails, PhoneNumbers
prebuilt-tax.us.w2US W-2 formsEmployee, Employer, WagesTips, FederalIncomeTax
prebuilt-layoutAny documentPages, Tables, Paragraphs, SelectionMarks
prebuilt-readAny documentText lines, words, languages

Prerequisites

  • Azure subscription with Contributor role
  • Azure CLI 2.60+
  • Python 3.9+ with azure-ai-documentintelligence>=1.0.0
  • .NET 8 with Azure.AI.DocumentIntelligence
  • Sample documents (invoice PDF, receipt image)

Implementation

Task 1: Provision Azure Document Intelligence

RG="rg-ai102-docintell"
LOCATION="eastus"
DOC_INTEL="docintell-ai102-$(openssl rand -hex 4)"

az group create --name $RG --location $LOCATION

# Create Document Intelligence resource
az cognitiveservices account create \
--name $DOC_INTEL \
--resource-group $RG \
--location $LOCATION \
--kind FormRecognizer \
--sku S0 \
--yes

# Get endpoint and key
DOC_ENDPOINT=$(az cognitiveservices account show \
--name $DOC_INTEL --resource-group $RG \
--query "properties.endpoint" -o tsv)

DOC_KEY=$(az cognitiveservices account keys list \
--name $DOC_INTEL --resource-group $RG \
--query "key1" -o tsv)

echo "Endpoint: $DOC_ENDPOINT"

Task 2: Analyze an invoice with prebuilt model

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest

credential = AzureKeyCredential(DOC_KEY)
client = DocumentIntelligenceClient(endpoint=DOC_ENDPOINT, credential=credential)

# Analyze invoice from URL
invoice_url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/forms/Invoice_1.pdf"

poller = client.begin_analyze_document(
"prebuilt-invoice",
AnalyzeDocumentRequest(url_source=invoice_url)
)
result = poller.result()

# Extract invoice fields
for document in result.documents:
print(f"Document type: {document.doc_type}")
print(f"Confidence: {document.confidence:.2%}")

fields = document.fields
if fields.get("VendorName"):
print(f" Vendor: {fields['VendorName'].value_string} (confidence: {fields['VendorName'].confidence:.2%})")
if fields.get("InvoiceTotal"):
total = fields["InvoiceTotal"]
print(f" Total: {total.value_currency.amount} {total.value_currency.currency_code} (confidence: {total.confidence:.2%})")
if fields.get("InvoiceDate"):
print(f" Date: {fields['InvoiceDate'].value_date} (confidence: {fields['InvoiceDate'].confidence:.2%})")
if fields.get("DueDate"):
print(f" Due: {fields['DueDate'].value_date}")

# Line items
if fields.get("Items"):
print(f"\n Line Items ({len(fields['Items'].value_list)} items):")
for i, item in enumerate(fields["Items"].value_list):
item_fields = item.value_object
desc = item_fields.get("Description", {})
amount = item_fields.get("Amount", {})
print(f" {i+1}. {desc.value_string if desc else 'N/A'} — ${amount.value_currency.amount if amount else 'N/A'}")

Task 3: Extract ID document information

# Analyze ID document (driver's license, passport, etc.)
id_url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/id_documents/license.jpg"

poller = client.begin_analyze_document(
"prebuilt-idDocument",
AnalyzeDocumentRequest(url_source=id_url)
)
result = poller.result()

for document in result.documents:
fields = document.fields
print(f"Document type: {document.doc_type}") # e.g., "idDocument.driverLicense"

if fields.get("FirstName"):
print(f" First Name: {fields['FirstName'].value_string}")
if fields.get("LastName"):
print(f" Last Name: {fields['LastName'].value_string}")
if fields.get("DateOfBirth"):
print(f" DOB: {fields['DateOfBirth'].value_date}")
if fields.get("DocumentNumber"):
print(f" Document #: {fields['DocumentNumber'].value_string}")
if fields.get("DateOfExpiration"):
print(f" Expires: {fields['DateOfExpiration'].value_date}")
if fields.get("Address"):
print(f" Address: {fields['Address'].value_address}")

Task 4: Use the Layout model for tables and structure

# Layout model extracts structure: pages, tables, paragraphs, selection marks
layout_url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/forms/Invoice_1.pdf"

poller = client.begin_analyze_document(
"prebuilt-layout",
AnalyzeDocumentRequest(url_source=layout_url)
)
result = poller.result()

# Extract page information
for page in result.pages:
print(f"Page {page.page_number}: {page.width}x{page.height} ({page.unit})")
print(f" Lines: {len(page.lines)}")
print(f" Words: {len(page.words)}")

# Extract tables
if result.tables:
for table_idx, table in enumerate(result.tables):
print(f"\nTable {table_idx + 1}: {table.row_count} rows x {table.column_count} cols")
for cell in table.cells:
print(f" [{cell.row_index},{cell.column_index}] = {cell.content}")

# Extract paragraphs
if result.paragraphs:
print(f"\nParagraphs: {len(result.paragraphs)}")
for para in result.paragraphs[:5]:
print(f" Role: {para.role or 'body'} | {para.content[:60]}...")

Expected Output

Document type: invoice
Confidence: 95.20%
Vendor: CONTOSO LTD. (confidence: 97.80%)
Total: 3800.00 USD (confidence: 96.50%)
Date: 2024-01-15 (confidence: 98.10%)
Due: 2024-02-15

Line Items (4 items):
1. Consulting Services — $1500.00
2. Software License — $1200.00
3. Support Plan — $800.00
4. Training — $300.00

Break & fix

#ScenarioSymptomRoot CauseFix
1Model returns empty resultsdocuments array is emptyWrong model for document type (e.g., using prebuilt-receipt for an invoice)Select the correct prebuilt model matching your document type
2Low confidence scoresFields extracted with < 50% confidenceDocument is poor quality (blurry scan, handwritten)Use higher resolution scans; consider custom model for handwritten docs
3"Resource not found" errorHTTP 404 on analyze endpointUsing old Form Recognizer endpoint format instead of Document IntelligenceUse endpoint format: {endpoint}/documentintelligence/documentModels/{model}:analyze?api-version=2024-11-30
4Timeout on large documentsLong-running operation never completesDocument exceeds page limit (2000 pages for layout) or is very largeSplit large documents; use pages parameter to process specific pages
5Missing line itemsInvoice total extracted but items array is emptyDocument layout is non-standard; model can't identify table structureTry prebuilt-layout to see raw table extraction; consider custom model

Knowledge Check

1. You need to extract the vendor name, invoice total, and line items from scanned invoices. Which model should you use?

2. The Document Intelligence analyze operation returns immediately with an Operation-Location header. What does this indicate?

3. A field is extracted with confidence 0.45 (45%). What should your application do?

4. Which prebuilt model extracts tables, paragraphs, and selection marks from ANY document type without needing to know the document format?

5. What is the correct API endpoint format for analyzing a document with Document Intelligence (2024-11-30 API)?

Cleanup

az group delete --name rg-ai102-docintell --yes --no-wait

Learn More