Skip to main content

Challenge 49: End-to-End Enterprise AI Solution

Estimated Time

3-4 hours | Cost: ~$5-10 (multiple AI services) | Capstone: All 6 AI-102 Domains

Cost Warning

This capstone deploys multiple Azure AI services. Monitor costs carefully and clean up resources when done. The challenge uses Basic/S0 tiers where needed.

Exam skills covered (All Domains)

DomainSkills
1. Plan & ManageResource deployment, networking, RBAC, monitoring, responsible AI
2. Content ModerationAzure AI Content Safety, text/image moderation, custom categories
3. Computer VisionImage analysis, OCR, custom vision, spatial analysis
4. NLPText analytics, language understanding, translation, speech services
5. Generative AIAzure OpenAI chat/completions, RAG, embeddings, prompt engineering
6. Knowledge MiningAI Search, Document Intelligence, skillsets, vector search

Overview

You are building an Enterprise Document Intelligence Platform for a global financial services company. The platform:

  1. Ingests documents (contracts, reports, correspondence) in multiple languages
  2. Extracts text, tables, and entities using Document Intelligence & Computer Vision
  3. Translates content to English using Translator
  4. Enriches with NLP (sentiment, key phrases, PII detection, custom entities)
  5. Moderates content through AI Content Safety
  6. Indexes everything in Azure AI Search with vector embeddings
  7. Serves a conversational RAG interface using Azure OpenAI

Challenge 49 - Capstone Architecture

Prerequisites

  • Azure subscription with Contributor role
  • Access to Azure OpenAI (approved)
  • Python 3.9+ with packages:
    azure-search-documents>=11.4.0
    azure-ai-documentintelligence>=1.0.0
    azure-ai-textanalytics>=5.3.0
    azure-ai-vision-imageanalysis>=1.0.0
    azure-cognitiveservices-speech>=1.37.0
    azure-ai-contentsafety>=1.0.0
    openai>=1.0.0
    azure-storage-blob>=12.0.0
    azure-identity>=1.15.0
  • .NET 8 with packages:
    Azure.Search.Documents
    Azure.AI.DocumentIntelligence
    Azure.AI.TextAnalytics
    Azure.AI.Vision.ImageAnalysis
    Azure.AI.ContentSafety
    Azure.AI.OpenAI
    Microsoft.CognitiveServices.Speech

Implementation

Task 1: Deploy all Azure AI resources (Domain 1 — Plan & Manage)

RG="rg-ai102-capstone"
LOCATION="eastus"
UNIQUE_ID=$(openssl rand -hex 4)

az group create --name $RG --location $LOCATION

# 1. Azure AI Services (multi-service for Vision, Text Analytics, Content Safety)
az cognitiveservices account create \
--name "ai-services-${UNIQUE_ID}" \
--resource-group $RG \
--location $LOCATION \
--kind AIServices \
--sku S0 --yes

# 2. Azure OpenAI
az cognitiveservices account create \
--name "aoai-${UNIQUE_ID}" \
--resource-group $RG \
--location $LOCATION \
--kind OpenAI \
--sku S0 --yes

# Deploy GPT-4o and embeddings
az cognitiveservices account deployment create \
--name "aoai-${UNIQUE_ID}" \
--resource-group $RG \
--deployment-name "gpt-4o" \
--model-name "gpt-4o" \
--model-version "2024-08-06" \
--model-format OpenAI \
--sku-capacity 30 \
--sku-name "Standard"

az cognitiveservices account deployment create \
--name "aoai-${UNIQUE_ID}" \
--resource-group $RG \
--deployment-name "text-embedding-3-small" \
--model-name "text-embedding-3-small" \
--model-version "1" \
--model-format OpenAI \
--sku-capacity 30 \
--sku-name "Standard"

# 3. Document Intelligence
az cognitiveservices account create \
--name "docintell-${UNIQUE_ID}" \
--resource-group $RG \
--location $LOCATION \
--kind FormRecognizer \
--sku S0 --yes

# 4. Azure AI Search (Basic for vector + semantic)
az search service create \
--name "search-${UNIQUE_ID}" \
--resource-group $RG \
--location $LOCATION \
--sku basic

# 5. Translator
az cognitiveservices account create \
--name "translator-${UNIQUE_ID}" \
--resource-group $RG \
--location $LOCATION \
--kind TextTranslation \
--sku S1 --yes

# 6. Speech Service
az cognitiveservices account create \
--name "speech-${UNIQUE_ID}" \
--resource-group $RG \
--location $LOCATION \
--kind SpeechServices \
--sku S0 --yes

# 7. Storage Account
az storage account create \
--name "stcapstone${UNIQUE_ID}" \
--resource-group $RG \
--location $LOCATION \
--sku Standard_LRS

az storage container create --name "documents" --account-name "stcapstone${UNIQUE_ID}" --auth-mode login
az storage container create --name "images" --account-name "stcapstone${UNIQUE_ID}" --auth-mode login

# Get all connection info
SEARCH_ENDPOINT="https://search-${UNIQUE_ID}.search.windows.net"
SEARCH_KEY=$(az search admin-key show --resource-group $RG --service-name "search-${UNIQUE_ID}" --query "primaryKey" -o tsv)
AOAI_ENDPOINT=$(az cognitiveservices account show --name "aoai-${UNIQUE_ID}" --resource-group $RG --query "properties.endpoint" -o tsv)
AOAI_KEY=$(az cognitiveservices account keys list --name "aoai-${UNIQUE_ID}" --resource-group $RG --query "key1" -o tsv)
AI_ENDPOINT=$(az cognitiveservices account show --name "ai-services-${UNIQUE_ID}" --resource-group $RG --query "properties.endpoint" -o tsv)
AI_KEY=$(az cognitiveservices account keys list --name "ai-services-${UNIQUE_ID}" --resource-group $RG --query "key1" -o tsv)
DOC_ENDPOINT=$(az cognitiveservices account show --name "docintell-${UNIQUE_ID}" --resource-group $RG --query "properties.endpoint" -o tsv)
DOC_KEY=$(az cognitiveservices account keys list --name "docintell-${UNIQUE_ID}" --resource-group $RG --query "key1" -o tsv)
TRANSLATOR_KEY=$(az cognitiveservices account keys list --name "translator-${UNIQUE_ID}" --resource-group $RG --query "key1" -o tsv)
SPEECH_KEY=$(az cognitiveservices account keys list --name "speech-${UNIQUE_ID}" --resource-group $RG --query "key1" -o tsv)
STORAGE_CONN=$(az storage account show-connection-string --name "stcapstone${UNIQUE_ID}" --resource-group $RG --query "connectionString" -o tsv)

echo "All resources deployed successfully"

Task 2: Configure RBAC and monitoring (Domain 1 — Plan & Manage)

# Enable diagnostic logging on AI Search
LAW_ID=$(az monitor log-analytics workspace create --resource-group $RG --workspace-name "law-capstone-${UNIQUE_ID}" --query id -o tsv)

az monitor diagnostic-settings create \
--name "search-diagnostics" \
--resource "/subscriptions/$(az account show --query id -o tsv)/resourceGroups/$RG/providers/Microsoft.Search/searchServices/search-${UNIQUE_ID}" \
--logs '[{"category": "OperationLogs", "enabled": true}]' \
--metrics '[{"category": "AllMetrics", "enabled": true}]' \
--workspace "$LAW_ID"

# Create RBAC role assignments for managed identity scenario
# (In production, use managed identity instead of API keys)
PRINCIPAL_ID=$(az ad signed-in-user show --query id -o tsv)

# Cognitive Services User (for AI services)
az role assignment create \
--assignee $PRINCIPAL_ID \
--role "Cognitive Services User" \
--scope "/subscriptions/$(az account show --query id -o tsv)/resourceGroups/$RG"

# Search Index Data Contributor
az role assignment create \
--assignee $PRINCIPAL_ID \
--role "Search Index Data Contributor" \
--scope "/subscriptions/$(az account show --query id -o tsv)/resourceGroups/$RG/providers/Microsoft.Search/searchServices/search-${UNIQUE_ID}"

Task 3: Extract content with Document Intelligence (Domain 6)

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
import os

doc_client = DocumentIntelligenceClient(
endpoint=DOC_ENDPOINT,
credential=AzureKeyCredential(DOC_KEY)
)

def extract_document(url: str) -> dict:
"""Extract text, tables, and structure from document."""
poller = doc_client.begin_analyze_document(
"prebuilt-layout",
AnalyzeDocumentRequest(url_source=url)
)
result = poller.result()

# Extract content
content_parts = []
for page in result.pages:
for line in page.lines:
content_parts.append(line.content)

# Extract tables as structured data
tables = []
if result.tables:
for table in result.tables:
table_data = {"rows": [], "row_count": table.row_count, "col_count": table.column_count}
row_dict = {}
for cell in table.cells:
if cell.row_index not in row_dict:
row_dict[cell.row_index] = {}
row_dict[cell.row_index][cell.column_index] = cell.content
table_data["rows"] = [row_dict[r] for r in sorted(row_dict.keys())]
tables.append(table_data)

return {
"content": " ".join(content_parts),
"pages": len(result.pages),
"tables": tables,
"language": result.languages[0] if result.languages else "unknown"
}

# Process sample document
doc_data = extract_document(
"https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/forms/Invoice_1.pdf"
)
print(f"Extracted {doc_data['pages']} pages, {len(doc_data['content'])} chars, {len(doc_data['tables'])} tables")
print(f"Detected language: {doc_data['language']}")

Task 4: Translate non-English content (Domain 4 — NLP)

import requests
import uuid

TRANSLATOR_ENDPOINT = "https://api.cognitive.microsofttranslator.com"

def translate_text(text: str, target_language: str = "en") -> dict:
"""Translate text to target language."""
url = f"{TRANSLATOR_ENDPOINT}/translate?api-version=3.0&to={target_language}"

headers = {
"Ocp-Apim-Subscription-Key": TRANSLATOR_KEY,
"Ocp-Apim-Subscription-Region": LOCATION,
"Content-Type": "application/json",
"X-ClientTraceId": str(uuid.uuid4())
}

body = [{"text": text[:50000]}] # Max 50K chars per request
response = requests.post(url, headers=headers, json=body)
result = response.json()

return {
"translated_text": result[0]["translations"][0]["text"],
"detected_language": result[0].get("detectedLanguage", {}).get("language", "unknown"),
"confidence": result[0].get("detectedLanguage", {}).get("score", 0)
}

# Translate if document is not in English
if doc_data["language"] != "en":
translation = translate_text(doc_data["content"])
doc_data["content"] = translation["translated_text"]
doc_data["original_language"] = translation["detected_language"]
print(f"Translated from {translation['detected_language']} (confidence: {translation['confidence']:.2%})")
else:
doc_data["original_language"] = "en"
print("Content already in English — no translation needed")

Task 5: Apply NLP enrichment (Domain 4 — NLP)

from azure.ai.textanalytics import TextAnalyticsClient

text_client = TextAnalyticsClient(
endpoint=AI_ENDPOINT,
credential=AzureKeyCredential(AI_KEY)
)

def enrich_with_nlp(text: str) -> dict:
"""Apply NLP enrichment: sentiment, key phrases, entities, PII."""
# Chunk text if too long (5120 char limit per doc)
chunks = [text[i:i+5000] for i in range(0, len(text), 5000)]
first_chunk = [chunks[0]] # Use first chunk for analysis

# Sentiment Analysis
sentiment_result = text_client.analyze_sentiment(first_chunk)[0]

# Key Phrases
keyphrases_result = text_client.extract_key_phrases(first_chunk)[0]

# Named Entity Recognition
entities_result = text_client.recognize_entities(first_chunk)[0]

# PII Detection
pii_result = text_client.recognize_pii_entities(first_chunk)[0]

return {
"sentiment": sentiment_result.sentiment,
"confidence_scores": {
"positive": sentiment_result.confidence_scores.positive,
"neutral": sentiment_result.confidence_scores.neutral,
"negative": sentiment_result.confidence_scores.negative
},
"key_phrases": keyphrases_result.key_phrases[:20],
"entities": [
{"text": e.text, "category": e.category, "confidence": e.confidence_score}
for e in entities_result.entities[:20]
],
"pii_entities": [
{"text": e.text, "category": e.category}
for e in pii_result.entities
],
"redacted_text": pii_result.redacted_text
}

# Enrich the document
nlp_data = enrich_with_nlp(doc_data["content"])
print(f"Sentiment: {nlp_data['sentiment']}")
print(f"Key phrases: {nlp_data['key_phrases'][:5]}")
print(f"Entities found: {len(nlp_data['entities'])}")
print(f"PII entities found: {len(nlp_data['pii_entities'])}")

Task 6: Analyze images with Computer Vision (Domain 3)

from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures

vision_client = ImageAnalysisClient(
endpoint=AI_ENDPOINT,
credential=AzureKeyCredential(AI_KEY)
)

def analyze_image(image_url: str) -> dict:
"""Analyze image with Computer Vision."""
result = vision_client.analyze_from_url(
image_url=image_url,
visual_features=[
VisualFeatures.CAPTION,
VisualFeatures.TAGS,
VisualFeatures.OBJECTS,
VisualFeatures.READ,
VisualFeatures.DENSE_CAPTIONS,
]
)

return {
"caption": result.caption.text if result.caption else "",
"caption_confidence": result.caption.confidence if result.caption else 0,
"tags": [{"name": t.name, "confidence": t.confidence} for t in (result.tags.list if result.tags else [])],
"objects": [{"name": o.tags[0].name, "confidence": o.tags[0].confidence} for o in (result.objects.list if result.objects else [])],
"text_content": " ".join([line.text for block in (result.read.blocks if result.read else []) for line in block.lines]),
"dense_captions": [dc.text for dc in (result.dense_captions.list if result.dense_captions else [])]
}

# Analyze an image
image_data = analyze_image("https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png")
print(f"Caption: {image_data['caption']}")
print(f"Tags: {[t['name'] for t in image_data['tags'][:5]]}")
print(f"OCR text: {image_data['text_content'][:200]}")

Task 7: Content moderation check (Domain 2)

from azure.ai.contentsafety import ContentSafetyClient
from azure.ai.contentsafety.models import AnalyzeTextOptions

safety_client = ContentSafetyClient(
endpoint=AI_ENDPOINT,
credential=AzureKeyCredential(AI_KEY)
)

def moderate_content(text: str) -> dict:
"""Check content for safety before indexing."""
# Analyze text for harmful content
request = AnalyzeTextOptions(text=text[:10000])
response = safety_client.analyze_text(request)

categories = {}
is_safe = True
for category_result in response.categories_analysis:
categories[category_result.category] = category_result.severity
if category_result.severity > 2: # Severity 0-6, threshold at 2
is_safe = False

return {
"is_safe": is_safe,
"categories": categories,
"action": "index" if is_safe else "review"
}

# Moderate before indexing
moderation = moderate_content(doc_data["content"])
print(f"Content safe: {moderation['is_safe']}")
print(f"Categories: {moderation['categories']}")
print(f"Action: {moderation['action']}")

if not moderation["is_safe"]:
print("⚠️ Content flagged for review — will not be indexed automatically")

Task 8: Create vector search index and index documents (Domain 6)

from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SearchIndex, SearchField, SearchFieldDataType,
SimpleField, SearchableField,
VectorSearch, HnswAlgorithmConfiguration, VectorSearchProfile,
SemanticConfiguration, SemanticSearch, SemanticPrioritizedFields, SemanticField,
)
from azure.search.documents import SearchClient
from openai import AzureOpenAI
import hashlib
from datetime import datetime, timezone

# Initialize clients
index_client = SearchIndexClient(endpoint=SEARCH_ENDPOINT, credential=AzureKeyCredential(SEARCH_KEY))
aoai_client = AzureOpenAI(api_key=AOAI_KEY, api_version="2024-06-01", azure_endpoint=AOAI_ENDPOINT)

def get_embedding(text: str) -> list:
response = aoai_client.embeddings.create(input=text[:8000], model="text-embedding-3-small")
return response.data[0].embedding

# Create comprehensive index
fields = [
SimpleField(name="id", type=SearchFieldDataType.String, key=True, filterable=True),
SearchableField(name="title", type=SearchFieldDataType.String, filterable=True, sortable=True),
SearchableField(name="content", type=SearchFieldDataType.String),
SearchableField(name="content_redacted", type=SearchFieldDataType.String),
SimpleField(name="source_type", type=SearchFieldDataType.String, filterable=True, facetable=True),
SimpleField(name="original_language", type=SearchFieldDataType.String, filterable=True, facetable=True),
SimpleField(name="sentiment", type=SearchFieldDataType.String, filterable=True, facetable=True),
SearchableField(name="key_phrases", type=SearchFieldDataType.Collection(SearchFieldDataType.String), filterable=True, facetable=True),
SearchableField(name="entities", type=SearchFieldDataType.Collection(SearchFieldDataType.String), filterable=True, facetable=True),
SimpleField(name="is_safe", type=SearchFieldDataType.Boolean, filterable=True),
SimpleField(name="processed_date", type=SearchFieldDataType.DateTimeOffset, sortable=True),
SimpleField(name="page_count", type=SearchFieldDataType.Int32, filterable=True),
SearchField(
name="content_vector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=1536,
vector_search_profile_name="vector-profile"
),
]

index = SearchIndex(
name="enterprise-documents",
fields=fields,
vector_search=VectorSearch(
algorithms=[HnswAlgorithmConfiguration(name="hnsw-config")],
profiles=[VectorSearchProfile(name="vector-profile", algorithm_configuration_name="hnsw-config")]
),
semantic_search=SemanticSearch(configurations=[
SemanticConfiguration(name="semantic-config", prioritized_fields=SemanticPrioritizedFields(
title_field=SemanticField(field_name="title"),
content_fields=[SemanticField(field_name="content")]
))
])
)

index_client.create_or_update_index(index)
print("Enterprise documents index created")

# Index the processed document
search_client = SearchClient(endpoint=SEARCH_ENDPOINT, index_name="enterprise-documents", credential=AzureKeyCredential(SEARCH_KEY))

document = {
"id": hashlib.md5(b"invoice_1").hexdigest(),
"title": "Invoice_1.pdf",
"content": doc_data["content"],
"content_redacted": nlp_data["redacted_text"],
"source_type": "pdf",
"original_language": doc_data.get("original_language", "en"),
"sentiment": nlp_data["sentiment"],
"key_phrases": nlp_data["key_phrases"],
"entities": [e["text"] for e in nlp_data["entities"]],
"is_safe": moderation["is_safe"],
"processed_date": datetime.now(timezone.utc).isoformat(),
"page_count": doc_data["pages"],
"content_vector": get_embedding(doc_data["content"]),
}

search_client.upload_documents([document])
print(f"Document indexed: {document['title']}")

Task 9: Build RAG chat interface with Azure OpenAI (Domain 5)

from azure.search.documents.models import VectorizedQuery

def rag_chat(user_question: str, conversation_history: list = None) -> str:
"""RAG chat using Azure AI Search + Azure OpenAI."""
if conversation_history is None:
conversation_history = []

# Step 1: Hybrid search (keyword + vector + semantic)
query_vector = get_embedding(user_question)
search_results = search_client.search(
search_text=user_question,
vector_queries=[
VectorizedQuery(vector=query_vector, k_nearest_neighbors=5, fields="content_vector")
],
query_type="semantic",
semantic_configuration_name="semantic-config",
filter="is_safe eq true",
select=["title", "content", "source_type", "sentiment", "key_phrases"],
top=5
)

# Step 2: Build context from search results
context_parts = []
sources = []
for result in search_results:
context_parts.append(f"[Source: {result['title']}]\n{result['content'][:1000]}")
sources.append(result['title'])

context = "\n\n---\n\n".join(context_parts)

# Step 3: Generate response with Azure OpenAI
system_message = """You are an enterprise document assistant. Answer questions based ONLY on the provided context.
If the context doesn't contain enough information to answer, say so.
Always cite the source document title in your answer.
Never make up information not present in the context."""

messages = [
{"role": "system", "content": system_message},
*conversation_history,
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {user_question}"}
]

response = aoai_client.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=0.3,
max_tokens=1000
)

answer = response.choices[0].message.content
print(f"\n🤖 Assistant: {answer}")
print(f"\n📚 Sources: {', '.join(sources)}")
return answer

# Test the RAG interface
rag_chat("What invoices mention consulting services and what are the amounts?")
rag_chat("Summarize the key entities found across all documents")

Task 10: Speech integration — voice queries (Domain 4 — Speech)

import azure.cognitiveservices.speech as speechsdk

def speech_to_text_query() -> str:
"""Convert spoken question to text, then query RAG."""
speech_config = speechsdk.SpeechConfig(subscription=SPEECH_KEY, region=LOCATION)
speech_config.speech_recognition_language = "en-US"
audio_config = speechsdk.AudioConfig(use_default_microphone=True)

recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

print("🎤 Speak your question...")
result = recognizer.recognize_once_async().get()

if result.reason == speechsdk.ResultReason.RecognizedSpeech:
print(f"Recognized: {result.text}")
# Feed into RAG pipeline
answer = rag_chat(result.text)
return answer
elif result.reason == speechsdk.ResultReason.NoMatch:
print("No speech recognized")
elif result.reason == speechsdk.ResultReason.Canceled:
print(f"Speech recognition canceled: {result.cancellation_details.reason}")
return ""

def text_to_speech_response(text: str):
"""Convert RAG response to speech output."""
speech_config = speechsdk.SpeechConfig(subscription=SPEECH_KEY, region=LOCATION)
speech_config.speech_synthesis_voice_name = "en-US-JennyNeural"
audio_config = speechsdk.AudioConfig(use_default_speaker=True)

synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)

result = synthesizer.speak_text_async(text).get()
if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
print("🔊 Audio response delivered")
else:
print(f"Speech synthesis failed: {result.reason}")

# Demo: Voice-enabled RAG
# question = speech_to_text_query()
# Simulate with text input for automation
answer = rag_chat("What are the total amounts across all invoices?")
text_to_speech_response(answer[:500]) # TTS the answer

Expected Output

=== Enterprise Document Intelligence Platform ===

[Task 3] Extracted 1 pages, 2456 chars, 1 tables
Detected language: en

[Task 4] Content already in English — no translation needed

[Task 5] Sentiment: neutral
Key phrases: ['consulting services', 'total amount', 'payment terms']
Entities found: 8
PII entities found: 2

[Task 6] Caption: a presentation slide with text
Tags: ['text', 'presentation', 'slide']
OCR text: Azure AI Services overview...

[Task 7] Content safe: True
Categories: {Hate: 0, Violence: 0, SelfHarm: 0, Sexual: 0}
Action: index

[Task 8] Enterprise documents index created
Document indexed: Invoice_1.pdf

[Task 9] 🤖 Assistant: Based on the context, Invoice_1.pdf from Contoso Ltd mentions
consulting services with a total amount of $3,800.00 USD.
📚 Sources: Invoice_1.pdf

[Task 10] 🔊 Audio response delivered

Break & fix

#ScenarioSymptomRoot CauseFix
1OpenAI deployment quota exceededHTTP 429 "Rate limit exceeded"Too many concurrent requests or token consumption exceeded TPM quotaImplement retry with exponential backoff; increase deployment capacity; batch smaller requests
2Search index returns 0 resultsQueries return empty despite indexed documentsVector dimensions mismatch between embedding model and index field definitionEnsure index vectorSearchDimensions matches the embedding model output (1536 for text-embedding-3-small)
3PII detection misses sensitive dataKnown PII (SSNs, credit card numbers) not detectedText Analytics language parameter incorrect or content exceeds single-request limitSet correct language hint; chunk documents under 5,120 characters per API call
4Translator returns garbled outputTranslation quality is very poor for certain documentsSource document contains OCR errors that confuse translationPre-process OCR output to fix common errors; use Document Intelligence with higher resolution settings
5Content Safety blocks legitimate contentBusiness documents flagged as unsafeSafety threshold too aggressive (severity 0-1 is normal language variance)Adjust severity threshold from 2 to 4; create allowlist for known-safe document categories

Knowledge Check

1. You're designing an enterprise AI pipeline that processes documents in 12 languages. Where in the pipeline should translation occur?

2. Your RAG system returns hallucinated answers that aren't in the source documents. What is the MOST effective mitigation?

3. Which Azure AI service would you use to ensure uploaded content doesn't contain hate speech or violent content before indexing?

4. You need to handle a document that contains both printed text AND handwritten notes. Which extraction approach handles both?

5. For RBAC best practices, which identity approach should you use in production for service-to-service communication?

6. Your search index has 1 million documents. Hybrid search (keyword + vector) returns results in 2 seconds. How can you improve latency?

7. A document processed through your pipeline contains personally identifiable information (PII). What should the pipeline do?

8. You want to enable voice-based queries against your document search system. What is the correct service chain?

Cleanup

Important

This capstone creates multiple billable resources. Always clean up when done.

# Delete the entire resource group and all resources
az group delete --name rg-ai102-capstone --yes --no-wait

echo "All capstone resources scheduled for deletion"

Learn More