Challenge 45: Azure Document Intelligence — Prebuilt Models
45-60 min | Cost: ~$1.00 (Document Intelligence S0 tier + transactions) | Domain: Knowledge Mining & Extraction (15-20%)
Exam skills covered
| Skill | Weight |
|---|---|
| Provision Azure AI Document Intelligence | High |
| Use prebuilt models to extract data from documents | High |
| Select the appropriate prebuilt model for a scenario | High |
| Handle confidence scores and extracted fields | Medium |
| Use the layout model for structure extraction | Medium |
Overview
Azure AI Document Intelligence (formerly Form Recognizer) uses machine learning to extract structured data from documents. Prebuilt models are pre-trained for common document types:
| Model | Use case | Key fields extracted |
|---|---|---|
prebuilt-invoice | Invoices | VendorName, InvoiceTotal, DueDate, LineItems |
prebuilt-receipt | Receipts | MerchantName, Total, TransactionDate, Items |
prebuilt-idDocument | IDs/Passports | FirstName, LastName, DateOfBirth, DocumentNumber |
prebuilt-businessCard | Business cards | ContactNames, Emails, PhoneNumbers |
prebuilt-tax.us.w2 | US W-2 forms | Employee, Employer, WagesTips, FederalIncomeTax |
prebuilt-layout | Any document | Pages, Tables, Paragraphs, SelectionMarks |
prebuilt-read | Any document | Text lines, words, languages |
Prerequisites
- Azure subscription with Contributor role
- Azure CLI 2.60+
- Python 3.9+ with
azure-ai-documentintelligence>=1.0.0 - .NET 8 with
Azure.AI.DocumentIntelligence - Sample documents (invoice PDF, receipt image)
Implementation
Task 1: Provision Azure Document Intelligence
RG="rg-ai102-docintell"
LOCATION="eastus"
DOC_INTEL="docintell-ai102-$(openssl rand -hex 4)"
az group create --name $RG --location $LOCATION
# Create Document Intelligence resource
az cognitiveservices account create \
--name $DOC_INTEL \
--resource-group $RG \
--location $LOCATION \
--kind FormRecognizer \
--sku S0 \
--yes
# Get endpoint and key
DOC_ENDPOINT=$(az cognitiveservices account show \
--name $DOC_INTEL --resource-group $RG \
--query "properties.endpoint" -o tsv)
DOC_KEY=$(az cognitiveservices account keys list \
--name $DOC_INTEL --resource-group $RG \
--query "key1" -o tsv)
echo "Endpoint: $DOC_ENDPOINT"
Task 2: Analyze an invoice with prebuilt model
- Python SDK
- C# SDK
- REST API
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
credential = AzureKeyCredential(DOC_KEY)
client = DocumentIntelligenceClient(endpoint=DOC_ENDPOINT, credential=credential)
# Analyze invoice from URL
invoice_url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/forms/Invoice_1.pdf"
poller = client.begin_analyze_document(
"prebuilt-invoice",
AnalyzeDocumentRequest(url_source=invoice_url)
)
result = poller.result()
# Extract invoice fields
for document in result.documents:
print(f"Document type: {document.doc_type}")
print(f"Confidence: {document.confidence:.2%}")
fields = document.fields
if fields.get("VendorName"):
print(f" Vendor: {fields['VendorName'].value_string} (confidence: {fields['VendorName'].confidence:.2%})")
if fields.get("InvoiceTotal"):
total = fields["InvoiceTotal"]
print(f" Total: {total.value_currency.amount} {total.value_currency.currency_code} (confidence: {total.confidence:.2%})")
if fields.get("InvoiceDate"):
print(f" Date: {fields['InvoiceDate'].value_date} (confidence: {fields['InvoiceDate'].confidence:.2%})")
if fields.get("DueDate"):
print(f" Due: {fields['DueDate'].value_date}")
# Line items
if fields.get("Items"):
print(f"\n Line Items ({len(fields['Items'].value_list)} items):")
for i, item in enumerate(fields["Items"].value_list):
item_fields = item.value_object
desc = item_fields.get("Description", {})
amount = item_fields.get("Amount", {})
print(f" {i+1}. {desc.value_string if desc else 'N/A'} — ${amount.value_currency.amount if amount else 'N/A'}")
using Azure;
using Azure.AI.DocumentIntelligence;
var client = new DocumentIntelligenceClient(
new Uri(docEndpoint),
new AzureKeyCredential(docKey));
var invoiceUrl = new Uri("https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/forms/Invoice_1.pdf");
var operation = await client.AnalyzeDocumentAsync(
WaitUntil.Completed,
"prebuilt-invoice",
new AnalyzeDocumentContent() { UrlSource = invoiceUrl });
var result = operation.Value;
foreach (var document in result.Documents)
{
Console.WriteLine($"Document type: {document.DocType}");
Console.WriteLine($"Confidence: {document.Confidence:P2}");
if (document.Fields.TryGetValue("VendorName", out var vendor))
Console.WriteLine($" Vendor: {vendor.ValueString} ({vendor.Confidence:P2})");
if (document.Fields.TryGetValue("InvoiceTotal", out var total))
Console.WriteLine($" Total: {total.ValueCurrency.Amount} {total.ValueCurrency.CurrencyCode}");
if (document.Fields.TryGetValue("Items", out var items))
{
Console.WriteLine($"\n Line Items ({items.ValueList.Count}):");
foreach (var item in items.ValueList)
{
var desc = item.ValueObject.GetValueOrDefault("Description")?.ValueString ?? "N/A";
var amount = item.ValueObject.GetValueOrDefault("Amount")?.ValueCurrency?.Amount;
Console.WriteLine($" - {desc}: ${amount}");
}
}
}
# Submit invoice for analysis
OPERATION_URL=$(curl -s -i -X POST \
"${DOC_ENDPOINT}/documentintelligence/documentModels/prebuilt-invoice:analyze?api-version=2024-11-30" \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: ${DOC_KEY}" \
-d '{"urlSource": "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/forms/Invoice_1.pdf"}' \
| grep -i "operation-location" | cut -d' ' -f2 | tr -d '\r')
echo "Operation URL: $OPERATION_URL"
# Poll for results (wait a few seconds)
sleep 10
curl -s "$OPERATION_URL" \
-H "Ocp-Apim-Subscription-Key: ${DOC_KEY}" | python -m json.tool
Task 3: Extract ID document information
- Python SDK
- C# SDK
- REST API
# Analyze ID document (driver's license, passport, etc.)
id_url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/id_documents/license.jpg"
poller = client.begin_analyze_document(
"prebuilt-idDocument",
AnalyzeDocumentRequest(url_source=id_url)
)
result = poller.result()
for document in result.documents:
fields = document.fields
print(f"Document type: {document.doc_type}") # e.g., "idDocument.driverLicense"
if fields.get("FirstName"):
print(f" First Name: {fields['FirstName'].value_string}")
if fields.get("LastName"):
print(f" Last Name: {fields['LastName'].value_string}")
if fields.get("DateOfBirth"):
print(f" DOB: {fields['DateOfBirth'].value_date}")
if fields.get("DocumentNumber"):
print(f" Document #: {fields['DocumentNumber'].value_string}")
if fields.get("DateOfExpiration"):
print(f" Expires: {fields['DateOfExpiration'].value_date}")
if fields.get("Address"):
print(f" Address: {fields['Address'].value_address}")
var idUrl = new Uri("https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/id_documents/license.jpg");
var idOp = await client.AnalyzeDocumentAsync(
WaitUntil.Completed,
"prebuilt-idDocument",
new AnalyzeDocumentContent() { UrlSource = idUrl });
var idResult = idOp.Value;
foreach (var doc in idResult.Documents)
{
Console.WriteLine($"Type: {doc.DocType}");
if (doc.Fields.TryGetValue("FirstName", out var first))
Console.WriteLine($" First Name: {first.ValueString}");
if (doc.Fields.TryGetValue("LastName", out var last))
Console.WriteLine($" Last Name: {last.ValueString}");
if (doc.Fields.TryGetValue("DateOfBirth", out var dob))
Console.WriteLine($" DOB: {dob.ValueDate}");
if (doc.Fields.TryGetValue("DocumentNumber", out var docNum))
Console.WriteLine($" Document #: {docNum.ValueString}");
}
# Analyze ID document
OPERATION_URL=$(curl -s -i -X POST \
"${DOC_ENDPOINT}/documentintelligence/documentModels/prebuilt-idDocument:analyze?api-version=2024-11-30" \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: ${DOC_KEY}" \
-d '{"urlSource": "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/id_documents/license.jpg"}' \
| grep -i "operation-location" | cut -d' ' -f2 | tr -d '\r')
sleep 10
curl -s "$OPERATION_URL" \
-H "Ocp-Apim-Subscription-Key: ${DOC_KEY}" | python -m json.tool
Task 4: Use the Layout model for tables and structure
- Python SDK
- C# SDK
- REST API
# Layout model extracts structure: pages, tables, paragraphs, selection marks
layout_url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/forms/Invoice_1.pdf"
poller = client.begin_analyze_document(
"prebuilt-layout",
AnalyzeDocumentRequest(url_source=layout_url)
)
result = poller.result()
# Extract page information
for page in result.pages:
print(f"Page {page.page_number}: {page.width}x{page.height} ({page.unit})")
print(f" Lines: {len(page.lines)}")
print(f" Words: {len(page.words)}")
# Extract tables
if result.tables:
for table_idx, table in enumerate(result.tables):
print(f"\nTable {table_idx + 1}: {table.row_count} rows x {table.column_count} cols")
for cell in table.cells:
print(f" [{cell.row_index},{cell.column_index}] = {cell.content}")
# Extract paragraphs
if result.paragraphs:
print(f"\nParagraphs: {len(result.paragraphs)}")
for para in result.paragraphs[:5]:
print(f" Role: {para.role or 'body'} | {para.content[:60]}...")
var layoutOp = await client.AnalyzeDocumentAsync(
WaitUntil.Completed,
"prebuilt-layout",
new AnalyzeDocumentContent() { UrlSource = new Uri(layoutUrl) });
var layoutResult = layoutOp.Value;
// Pages
foreach (var page in layoutResult.Pages)
{
Console.WriteLine($"Page {page.PageNumber}: {page.Width}x{page.Height} ({page.Unit})");
Console.WriteLine($" Lines: {page.Lines.Count}, Words: {page.Words.Count}");
}
// Tables
foreach (var table in layoutResult.Tables)
{
Console.WriteLine($"\nTable: {table.RowCount} rows x {table.ColumnCount} cols");
foreach (var cell in table.Cells)
{
Console.WriteLine($" [{cell.RowIndex},{cell.ColumnIndex}] = {cell.Content}");
}
}
# Layout analysis
OPERATION_URL=$(curl -s -i -X POST \
"${DOC_ENDPOINT}/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-11-30" \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: ${DOC_KEY}" \
-d '{"urlSource": "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/forms/Invoice_1.pdf"}' \
| grep -i "operation-location" | cut -d' ' -f2 | tr -d '\r')
sleep 10
curl -s "$OPERATION_URL" \
-H "Ocp-Apim-Subscription-Key: ${DOC_KEY}" | python -m json.tool
Expected Output
Document type: invoice
Confidence: 95.20%
Vendor: CONTOSO LTD. (confidence: 97.80%)
Total: 3800.00 USD (confidence: 96.50%)
Date: 2024-01-15 (confidence: 98.10%)
Due: 2024-02-15
Line Items (4 items):
1. Consulting Services — $1500.00
2. Software License — $1200.00
3. Support Plan — $800.00
4. Training — $300.00
Break & fix
| # | Scenario | Symptom | Root Cause | Fix |
|---|---|---|---|---|
| 1 | Model returns empty results | documents array is empty | Wrong model for document type (e.g., using prebuilt-receipt for an invoice) | Select the correct prebuilt model matching your document type |
| 2 | Low confidence scores | Fields extracted with < 50% confidence | Document is poor quality (blurry scan, handwritten) | Use higher resolution scans; consider custom model for handwritten docs |
| 3 | "Resource not found" error | HTTP 404 on analyze endpoint | Using old Form Recognizer endpoint format instead of Document Intelligence | Use endpoint format: {endpoint}/documentintelligence/documentModels/{model}:analyze?api-version=2024-11-30 |
| 4 | Timeout on large documents | Long-running operation never completes | Document exceeds page limit (2000 pages for layout) or is very large | Split large documents; use pages parameter to process specific pages |
| 5 | Missing line items | Invoice total extracted but items array is empty | Document layout is non-standard; model can't identify table structure | Try prebuilt-layout to see raw table extraction; consider custom model |
Knowledge Check
1. You need to extract the vendor name, invoice total, and line items from scanned invoices. Which model should you use?
2. The Document Intelligence analyze operation returns immediately with an Operation-Location header. What does this indicate?
3. A field is extracted with confidence 0.45 (45%). What should your application do?
4. Which prebuilt model extracts tables, paragraphs, and selection marks from ANY document type without needing to know the document format?
5. What is the correct API endpoint format for analyzing a document with Document Intelligence (2024-11-30 API)?
Cleanup
az group delete --name rg-ai102-docintell --yes --no-wait