Skip to main content

Challenge 15: RAG Pattern: Advanced

Estimated Time

60 min | Cost: ~$5.00 (embeddings + search + OpenAI) | Domain: Generative AI Solutions (15-20%)

Exam skills covered

  • Implement advanced RAG patterns with vector and hybrid search
  • Generate and use vector embeddings for semantic retrieval
  • Evaluate model and flow quality using built-in metrics

Overview

The basic RAG pattern uses keyword (lexical) search, which works well when users use the exact terminology present in documents. However, real-world queries often use synonyms, paraphrases, or conceptual descriptions that keyword search misses. Vector search solves this by converting both documents and queries into high-dimensional vectors (embeddings) that capture semantic meaning—allowing retrieval based on conceptual similarity rather than exact word matching.

Hybrid search combines the strengths of both approaches: keyword search for exact matches and acronyms, plus vector search for semantic understanding. Azure AI Search supports hybrid queries that execute both searches in parallel and fuse the results using Reciprocal Rank Fusion (RRF). Adding a semantic ranker on top further improves results by using a deep learning model to re-rank the fused results by true semantic relevance to the query.

Chunking strategies determine how documents are split before embedding. Overlapping chunks (e.g., 512 tokens with 128-token overlap) preserve context across boundaries. The embedding model (text-embedding-3-small or text-embedding-ada-002) converts each chunk into a vector stored in the search index. Evaluation closes the loop—metrics like groundedness (is the answer supported by retrieved context?), relevance (does it answer the question?), and coherence (is it well-structured?) quantify RAG quality for systematic improvement.

Architecture

The advanced RAG pattern adds vector embeddings, hybrid search, and semantic ranking to improve retrieval quality, with evaluation metrics to measure end-to-end quality.

Challenge 15 topology

Prerequisites

  • Azure subscription with Azure OpenAI access
  • Azure AI Search service (Basic tier or above for semantic ranker)
  • GPT-4o deployment and text-embedding-3-small deployment
  • Python 3.9+ with openai, azure-search-documents, and azure-identity packages
  • Documents from Challenge 14 (or new sample data)

Implementation

Task 1: Generate Embeddings

import os
from openai import AzureOpenAI

endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_KEY"]

client = AzureOpenAI(
azure_endpoint=endpoint,
api_key=api_key,
api_version="2024-10-21"
)

# Sample documents to embed
documents = [
{
"id": "1",
"title": "Azure AI Foundry Overview",
"content": "Azure AI Foundry is a unified platform for building generative AI applications. It provides a hub-and-project architecture where hubs manage shared infrastructure including Storage, Key Vault, and Container Registry. Projects are workspaces where teams build and deploy AI solutions.",
"category": "platform"
},
{
"id": "2",
"title": "Azure OpenAI Model Deployment",
"content": "Azure OpenAI supports multiple deployment types: Standard uses shared compute with pay-per-token billing. Global Standard routes traffic globally for higher availability. Provisioned reserves dedicated compute capacity with guaranteed throughput measured in PTUs.",
"category": "models"
},
{
"id": "3",
"title": "Responsible AI and Content Filtering",
"content": "Microsoft's Responsible AI principles include fairness, reliability, privacy, inclusiveness, transparency, and accountability. Azure AI services include built-in content filters that detect and block harmful content in categories including hate, sexual, violence, and self-harm.",
"category": "governance"
},
{
"id": "4",
"title": "Azure AI Search Capabilities",
"content": "Azure AI Search provides full-text search, vector search, and hybrid search combining both. Semantic ranking uses deep learning to re-rank results by relevance. Skillsets enable AI enrichment during indexing including OCR, entity recognition, and custom skills.",
"category": "search"
},
{
"id": "5",
"title": "Vector Embeddings and Semantic Search",
"content": "Vector embeddings represent text as high-dimensional numerical arrays capturing semantic meaning. Similar concepts have vectors close together in embedding space. Text-embedding-3-small produces 1536-dimension vectors optimized for search and retrieval tasks.",
"category": "search"
}
]

# Generate embeddings for each document
for doc in documents:
response = client.embeddings.create(
model="text-embedding-3-small", # deployment name
input=doc["content"]
)
doc["content_vector"] = response.data[0].embedding
print(f"Embedded '{doc['title']}': {len(doc['content_vector'])} dimensions")

# Generate embedding for a query
query = "How do I deploy AI models with guaranteed performance?"
query_response = client.embeddings.create(
model="text-embedding-3-small",
input=query
)
query_vector = query_response.data[0].embedding
print(f"\nQuery embedded: '{query}' -> {len(query_vector)} dimensions")

Task 2: Create Vector Index with Hybrid Fields

import os
from azure.identity import DefaultAzureCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SearchIndex,
SimpleField,
SearchableField,
SearchField,
SearchFieldDataType,
VectorSearch,
HnswAlgorithmConfiguration,
VectorSearchProfile,
SemanticConfiguration,
SemanticSearch,
SemanticPrioritizedFields,
SemanticField,
)

endpoint = os.environ["AZURE_SEARCH_ENDPOINT"]
credential = DefaultAzureCredential()

index_client = SearchIndexClient(endpoint=endpoint, credential=credential)

# Define index with vector field + keyword fields + semantic config
fields = [
SimpleField(name="id", type=SearchFieldDataType.String, key=True, filterable=True),
SearchableField(name="title", type=SearchFieldDataType.String, filterable=True),
SearchableField(name="content", type=SearchFieldDataType.String),
SimpleField(name="category", type=SearchFieldDataType.String, filterable=True, facetable=True),
# Vector field for embeddings
SearchField(
name="content_vector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=1536,
vector_search_profile_name="my-vector-profile"
),
]

# Configure vector search with HNSW algorithm
vector_search = VectorSearch(
algorithms=[
HnswAlgorithmConfiguration(
name="my-hnsw-config",
parameters={
"m": 4,
"efConstruction": 400,
"efSearch": 500,
"metric": "cosine"
}
)
],
profiles=[
VectorSearchProfile(
name="my-vector-profile",
algorithm_configuration_name="my-hnsw-config"
)
]
)

# Configure semantic ranking
semantic_config = SemanticConfiguration(
name="my-semantic-config",
prioritized_fields=SemanticPrioritizedFields(
title_field=SemanticField(field_name="title"),
content_fields=[SemanticField(field_name="content")]
)
)

semantic_search = SemanticSearch(configurations=[semantic_config])

# Create the index
index = SearchIndex(
name="ai102-vector-index",
fields=fields,
vector_search=vector_search,
semantic_search=semantic_search
)

result = index_client.create_or_update_index(index)
print(f"Vector index created: {result.name}")
print(f" Vector dimensions: 1536")
print(f" Algorithm: HNSW (cosine similarity)")
print(f" Semantic config: my-semantic-config")
import os
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery

# Initialize clients
openai_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
openai_key = os.environ["AZURE_OPENAI_KEY"]
search_endpoint = os.environ["AZURE_SEARCH_ENDPOINT"]

openai_client = AzureOpenAI(
azure_endpoint=openai_endpoint,
api_key=openai_key,
api_version="2024-10-21"
)

search_client = SearchClient(
endpoint=search_endpoint,
index_name="ai102-vector-index",
credential=DefaultAzureCredential()
)

# Documents with pre-computed embeddings
documents = [
{"id": "1", "title": "Azure AI Foundry Overview", "content": "Azure AI Foundry is a unified platform...", "category": "platform"},
{"id": "2", "title": "Azure OpenAI Deployment Types", "content": "Azure OpenAI supports Standard, Global Standard, and Provisioned deployment types...", "category": "models"},
{"id": "3", "title": "Responsible AI", "content": "Microsoft's Responsible AI principles include fairness, reliability, privacy...", "category": "governance"},
{"id": "4", "title": "Azure AI Search", "content": "Azure AI Search provides full-text, vector, and hybrid search...", "category": "search"},
{"id": "5", "title": "Vector Embeddings", "content": "Vector embeddings represent text as high-dimensional numerical arrays...", "category": "search"},
]

# Generate embeddings and upload
for doc in documents:
embedding_response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=doc["content"]
)
doc["content_vector"] = embedding_response.data[0].embedding

result = search_client.upload_documents(documents=documents)
print(f"Uploaded {len(result)} documents with vectors")

# --- Hybrid Search (keyword + vector) ---
query_text = "How do I get guaranteed model performance?"

# Generate query embedding
query_embedding = openai_client.embeddings.create(
model="text-embedding-3-small",
input=query_text
).data[0].embedding

# Execute hybrid search (combines keyword + vector via RRF)
results = search_client.search(
search_text=query_text, # Keyword component
vector_queries=[
VectorizedQuery(
vector=query_embedding,
k_nearest_neighbors=3,
fields="content_vector"
)
],
query_type="semantic", # Enable semantic ranking
semantic_configuration_name="my-semantic-config",
top=3
)

print(f"\nHybrid Search Results for: '{query_text}'")
print("-" * 60)
for result in results:
print(f" Score: {result['@search.score']:.4f} | "
f"Reranker: {result.get('@search.reranker_score', 'N/A')} | "
f"Title: {result['title']}")
print(f" Content: {result['content'][:100]}...")
print()
import os
from openai import AzureOpenAI

endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_KEY"]
search_endpoint = os.environ["AZURE_SEARCH_ENDPOINT"]
search_key = os.environ["AZURE_SEARCH_KEY"]

client = AzureOpenAI(
azure_endpoint=endpoint,
api_key=api_key,
api_version="2024-10-21"
)

# RAG with hybrid search (vector + keyword + semantic ranking)
question = "How can I ensure consistent AI model performance for production workloads?"

response = client.chat.completions.create(
model="gpt-4o-standard",
messages=[
{"role": "system", "content": "You are an Azure AI expert. Answer based on the provided context. Cite your sources."},
{"role": "user", "content": question}
],
extra_body={
"data_sources": [
{
"type": "azure_search",
"parameters": {
"endpoint": search_endpoint,
"index_name": "ai102-vector-index",
"authentication": {
"type": "api_key",
"key": search_key
},
"query_type": "vector_semantic_hybrid",
"embedding_dependency": {
"type": "deployment_name",
"deployment_name": "text-embedding-3-small"
},
"semantic_configuration": "my-semantic-config",
"top_n_documents": 3,
"in_scope": True
}
}
]
}
)

print(f"Question: {question}")
print(f"\nAnswer (Hybrid RAG):")
print(response.choices[0].message.content)

Task 5: Evaluate RAG Quality

import os
from openai import AzureOpenAI

endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_KEY"]

client = AzureOpenAI(
azure_endpoint=endpoint,
api_key=api_key,
api_version="2024-10-21"
)

# Test cases for evaluation
test_cases = [
{
"question": "What deployment type guarantees throughput?",
"context": "Azure OpenAI supports Standard (shared compute), Global Standard (global routing), and Provisioned (dedicated compute with guaranteed throughput measured in PTUs).",
"answer": "Provisioned deployment type guarantees throughput by reserving dedicated compute capacity measured in Provisioned Throughput Units (PTUs).",
"ground_truth": "Provisioned deployments reserve dedicated compute capacity with guaranteed throughput."
},
{
"question": "How does hybrid search work?",
"context": "Azure AI Search provides full-text search, vector search, and hybrid search combining both. Results are fused using Reciprocal Rank Fusion (RRF).",
"answer": "Hybrid search combines keyword (full-text) search and vector search, fusing results using Reciprocal Rank Fusion (RRF) to leverage both exact matching and semantic similarity.",
"ground_truth": "Hybrid search combines keyword and vector search using RRF fusion."
}
]

# Evaluate: Groundedness, Relevance, Coherence
metrics = ["groundedness", "relevance", "coherence"]

evaluation_prompt = """You are an AI quality evaluator. Rate the following on a scale of 1-5:

Metric: {metric}
- Groundedness: Is the answer fully supported by the provided context? (1=fabricated, 5=fully supported)
- Relevance: Does the answer directly address the question? (1=irrelevant, 5=perfectly relevant)
- Coherence: Is the answer well-structured and easy to understand? (1=incoherent, 5=perfectly clear)

Question: {question}
Context: {context}
Answer: {answer}

Return ONLY a single number (1-5)."""

print("=" * 70)
print("RAG QUALITY EVALUATION")
print("=" * 70)

for i, test in enumerate(test_cases):
print(f"\nTest Case {i+1}: {test['question']}")
scores = {}

for metric in metrics:
response = client.chat.completions.create(
model="gpt-4o-standard",
messages=[
{"role": "user", "content": evaluation_prompt.format(
metric=metric,
question=test["question"],
context=test["context"],
answer=test["answer"]
)}
],
max_tokens=5,
temperature=0.0
)
score = response.choices[0].message.content.strip()
scores[metric] = score

print(f" Groundedness: {scores['groundedness']}/5")
print(f" Relevance: {scores['relevance']}/5")
print(f" Coherence: {scores['coherence']}/5")

print("\n" + "=" * 70)
print("EVALUATION COMPLETE")
print("Target: All metrics >= 4 for production readiness")

Expected Output

After completing all tasks, you should have:

  1. Vector embeddings generated using text-embedding-3-small (1536 dimensions)
  2. Vector search index ai102-vector-index with:
    • HNSW algorithm configuration (cosine similarity)
    • Semantic ranking configuration
    • Both searchable text fields and vector field
  3. Hybrid search results combining keyword, vector, and semantic ranking
  4. RAG responses using vector_semantic_hybrid query type
  5. Evaluation scores for groundedness, relevance, and coherence (target ≥ 4/5)

Break & fix

ScenarioSymptomRoot CauseFix
Vector dimension mismatchInvalidVectorDimensionErrorIndex expects 1536 but embedding has different dimensionsEnsure embedding model deployment matches index dimensions field
Semantic ranker unavailableSemanticSearchNotAvailableSearch service on Free tierUpgrade to Basic tier or above for semantic ranking
Empty vector search results0 hits despite relevant documentsVector field not populated or wrong field name in queryVerify content_vector field has data; check fields param in query
Low evaluation scoresGroundedness < 3Retrieved chunks not relevant; chunking too coarseReduce chunk size, add overlap, or increase top_n_documents
Embedding rate limit429 on embeddings endpointToo many embedding requests in batchAdd delays between batches; deploy with higher TPM

Knowledge Check

1. What advantage does hybrid search provide over pure vector search or pure keyword search?

2. What does the HNSW algorithm in Azure AI Search vector configuration control?

3. When configuring 'On Your Data' with query_type 'vector_semantic_hybrid', what three search techniques are combined?

4. What does the 'groundedness' evaluation metric measure in a RAG system?

5. What is the purpose of overlapping chunks in a RAG chunking strategy?

Cleanup

az group delete --name rg-ai102-challenge15 --yes --no-wait

Learn More