Skip to main content

Challenge 14: RAG Pattern: Basic

Estimated Time

60 min | Cost: ~$3.00 (AI Search + OpenAI) | Domain: Generative AI Solutions (15-20%)

Exam skills covered

  • Implement Retrieval Augmented Generation (RAG) by grounding models in your data
  • Create and configure Azure AI Search indexes
  • Use Azure OpenAI "On Your Data" feature for grounded responses

Overview

Retrieval Augmented Generation (RAG) is a pattern that enhances LLM responses by first retrieving relevant information from an external knowledge base, then providing that context to the model alongside the user's question. This "grounds" the model's response in factual data, significantly reducing hallucinations and enabling the model to answer questions about proprietary or current information it wasn't trained on.

The basic RAG pattern in Azure uses Azure AI Search as the retrieval layer and Azure OpenAI as the generation layer. Azure AI Search indexes your documents (PDFs, Word files, web pages, structured data) and provides fast, relevant search results. Azure OpenAI's "On Your Data" feature simplifies RAG by automatically orchestrating the retrieval and generation steps—you configure a data source connection, and the service handles chunking, searching, and prompt augmentation behind the scenes.

The architecture flow is: User Query → Azure OpenAI (with data source config) → Azure AI Search (retrieval) → Relevant chunks returned → LLM generates grounded response with citations. Understanding this flow, configuring the data source connection, and comparing grounded vs. ungrounded responses are essential skills for the AI-102 exam.

Architecture

The RAG pattern connects Azure OpenAI to Azure AI Search, enabling the model to retrieve relevant document chunks before generating responses.

Challenge 14 topology

Prerequisites

  • Azure subscription with Azure OpenAI access
  • Azure CLI installed
  • GPT-4o deployment (from Challenge 12)
  • Python 3.9+ with openai, azure-search-documents, and azure-identity packages

Implementation

Task 1: Create Azure AI Search Index

import os
from azure.identity import DefaultAzureCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SearchIndex,
SimpleField,
SearchableField,
SearchFieldDataType,
)
from azure.search.documents import SearchClient

endpoint = os.environ["AZURE_SEARCH_ENDPOINT"]
credential = DefaultAzureCredential()

# Create the search index
index_client = SearchIndexClient(endpoint=endpoint, credential=credential)

fields = [
SimpleField(name="id", type=SearchFieldDataType.String, key=True, filterable=True),
SearchableField(name="title", type=SearchFieldDataType.String, filterable=True),
SearchableField(name="content", type=SearchFieldDataType.String),
SimpleField(name="category", type=SearchFieldDataType.String, filterable=True, facetable=True),
SimpleField(name="source", type=SearchFieldDataType.String, filterable=True),
]

index = SearchIndex(name="ai102-docs-index", fields=fields)
result = index_client.create_or_update_index(index)
print(f"Index created: {result.name}")

Task 2: Upload Sample Documents

import os
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient

endpoint = os.environ["AZURE_SEARCH_ENDPOINT"]
credential = DefaultAzureCredential()

search_client = SearchClient(
endpoint=endpoint,
index_name="ai102-docs-index",
credential=credential
)

# Sample documents about Azure AI services
documents = [
{
"id": "1",
"title": "Azure AI Foundry Overview",
"content": "Azure AI Foundry is a unified platform for building generative AI applications. It provides a hub-and-project architecture where hubs manage shared infrastructure including Storage, Key Vault, and Container Registry. Projects are workspaces where teams build and deploy AI solutions. The platform supports model deployment, prompt flow orchestration, and evaluation capabilities.",
"category": "platform",
"source": "docs/ai-foundry-overview.md"
},
{
"id": "2",
"title": "Azure OpenAI Model Deployment",
"content": "Azure OpenAI supports multiple deployment types: Standard (shared compute, pay-per-token), Global Standard (global routing for higher availability), and Provisioned (dedicated compute with guaranteed throughput). Models available include GPT-4o for multimodal tasks, GPT-4o-mini for cost-efficient workloads, and embedding models like text-embedding-3-small for vector search.",
"category": "models",
"source": "docs/model-deployment.md"
},
{
"id": "3",
"title": "Responsible AI Principles",
"content": "Microsoft's Responsible AI principles include fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. Azure AI services include built-in content filters that detect and block harmful content categories including hate, sexual, violence, and self-harm. Custom content filters can be configured per deployment.",
"category": "governance",
"source": "docs/responsible-ai.md"
},
{
"id": "4",
"title": "Azure AI Search Capabilities",
"content": "Azure AI Search provides full-text search, vector search, and hybrid search combining both approaches. Semantic ranking uses deep learning models to re-rank results by semantic relevance. The service supports skillsets for AI enrichment during indexing, including OCR, entity recognition, and custom skills via Azure Functions.",
"category": "search",
"source": "docs/ai-search.md"
},
{
"id": "5",
"title": "Prompt Engineering Best Practices",
"content": "Effective prompts include clear instructions, relevant context, and specific output format requirements. System messages define the AI assistant's behavior and constraints. Few-shot examples in the prompt improve output consistency. Chain-of-thought prompting helps with complex reasoning tasks. Temperature controls randomness (0 for deterministic, 1 for creative).",
"category": "techniques",
"source": "docs/prompt-engineering.md"
}
]

result = search_client.upload_documents(documents=documents)
print(f"Uploaded {len(result)} documents")
for r in result:
print(f" {r.key}: {r.succeeded}")

Task 3: Configure Azure OpenAI "On Your Data"

import os
from openai import AzureOpenAI

endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_KEY"]
search_endpoint = os.environ["AZURE_SEARCH_ENDPOINT"]
search_key = os.environ["AZURE_SEARCH_KEY"]

client = AzureOpenAI(
azure_endpoint=endpoint,
api_key=api_key,
api_version="2024-10-21"
)

# Query with "On Your Data" (grounded response)
response = client.chat.completions.create(
model="gpt-4o-standard",
messages=[
{"role": "system", "content": "You are an AI assistant that helps users understand Azure AI services. Use the provided data sources to answer questions accurately."},
{"role": "user", "content": "What deployment types does Azure OpenAI support and what are their differences?"}
],
extra_body={
"data_sources": [
{
"type": "azure_search",
"parameters": {
"endpoint": search_endpoint,
"index_name": "ai102-docs-index",
"authentication": {
"type": "api_key",
"key": search_key
},
"query_type": "simple",
"top_n_documents": 3,
"in_scope": True
}
}
]
}
)

print("Grounded Response:")
print(response.choices[0].message.content)

# Check citations
if hasattr(response.choices[0].message, 'context'):
context = response.choices[0].message.context
if 'citations' in context:
print("\nCitations:")
for citation in context['citations']:
print(f" - {citation.get('title', 'N/A')} ({citation.get('filepath', 'N/A')})")

Task 4: Compare Grounded vs Ungrounded Responses

import os
from openai import AzureOpenAI

endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_KEY"]
search_endpoint = os.environ["AZURE_SEARCH_ENDPOINT"]
search_key = os.environ["AZURE_SEARCH_KEY"]

client = AzureOpenAI(
azure_endpoint=endpoint,
api_key=api_key,
api_version="2024-10-21"
)

question = "What are Microsoft's Responsible AI principles and how do content filters work?"

# Ungrounded response (no data source)
ungrounded = client.chat.completions.create(
model="gpt-4o-standard",
messages=[
{"role": "system", "content": "You are an AI assistant."},
{"role": "user", "content": question}
],
max_tokens=300
)

print("=" * 60)
print("UNGROUNDED RESPONSE (no data source):")
print("=" * 60)
print(ungrounded.choices[0].message.content)

# Grounded response (with data source)
grounded = client.chat.completions.create(
model="gpt-4o-standard",
messages=[
{"role": "system", "content": "You are an AI assistant. Answer based only on the provided data."},
{"role": "user", "content": question}
],
max_tokens=300,
extra_body={
"data_sources": [
{
"type": "azure_search",
"parameters": {
"endpoint": search_endpoint,
"index_name": "ai102-docs-index",
"authentication": {
"type": "api_key",
"key": search_key
},
"query_type": "simple",
"top_n_documents": 3,
"in_scope": True
}
}
]
}
)

print("\n" + "=" * 60)
print("GROUNDED RESPONSE (with Azure AI Search):")
print("=" * 60)
print(grounded.choices[0].message.content)

# Key differences to note:
print("\n" + "=" * 60)
print("COMPARISON NOTES:")
print("=" * 60)
print("- Grounded responses cite specific documents")
print("- Grounded responses stay within indexed knowledge")
print("- Ungrounded may include information not in your data")
print("- 'in_scope: true' restricts answers to indexed content only")

Expected Output

After completing all tasks, you should have:

  1. Azure AI Search index ai102-docs-index with 5 documents indexed
  2. Grounded chat completions returning answers sourced from your indexed documents
  3. Citations referencing specific document titles and paths
  4. Comparison showing grounded responses stay within your data while ungrounded may include external knowledge

Break & fix

ScenarioSymptomRoot CauseFix
Search returns no results"I don't have information about that" responseDocuments not indexed or query doesn't match contentVerify doc count with $count; check field is searchable
403 on search endpointAuthentication failed for data sourceWrong API key or RBAC not configuredUse admin key for index operations; verify key in data_source config
Empty citationsResponse has no context/citationsin_scope set to false or top_n_documents too lowSet in_scope: true and increase top_n_documents to 3-5
Hallucinated answersResponse includes info not in indexin_scope not enabledSet "in_scope": true to restrict answers to indexed content
Index creation failsServiceNotFoundSearch service not provisionedCreate search service: az search service create --sku basic

Knowledge Check

1. What is the primary purpose of the RAG (Retrieval Augmented Generation) pattern?

2. In Azure OpenAI's 'On Your Data' feature, what does the 'in_scope' parameter control?

3. Which SDK parameter in the OpenAI Python client configures the Azure AI Search data source for RAG?

4. What is the role of Azure AI Search in the basic RAG pattern?

5. What happens when you query Azure OpenAI with 'On Your Data' and the answer is not in the indexed documents (with in_scope=true)?

Cleanup

az group delete --name rg-ai102-challenge14 --yes --no-wait

Learn More