Pular para o conteúdo principal

Desafio 15: Padrão RAG: Avançado

Tempo Estimado

60 min | Custo: ~$5.00 (embeddings + search + OpenAI) | Domínio: Generative AI Solutions (15-20%)

Habilidades do exame cobertas

  • Implementar padrões avançados de RAG com busca vetorial e híbrida
  • Gerar e usar embeddings vetoriais para recuperação semântica
  • Avaliar a qualidade do modelo e do fluxo usando métricas integradas

Visão Geral

O padrão básico de RAG usa busca por palavras-chave (léxica), que funciona bem quando os usuários usam a terminologia exata presente nos documentos. No entanto, consultas do mundo real frequentemente usam sinônimos, paráfrases ou descrições conceituais que a busca por palavras-chave não encontra. A busca vetorial resolve isso convertendo tanto documentos quanto consultas em vetores de alta dimensionalidade (embeddings) que capturam o significado semântico — permitindo a recuperação baseada em similaridade conceitual em vez de correspondência exata de palavras.

A busca híbrida combina as forças de ambas as abordagens: busca por palavras-chave para correspondências exatas e acrônimos, mais busca vetorial para compreensão semântica. O Azure AI Search suporta consultas híbridas que executam ambas as buscas em paralelo e fundem os resultados usando Reciprocal Rank Fusion (RRF). Adicionar um semantic ranker por cima melhora ainda mais os resultados usando um modelo de deep learning para reordenar os resultados fundidos pela verdadeira relevância semântica para a consulta.

Estratégias de chunking determinam como os documentos são divididos antes do embedding. Chunks sobrepostos (ex.: 512 tokens com sobreposição de 128 tokens) preservam o contexto entre limites. O modelo de embedding (text-embedding-3-small ou text-embedding-ada-002) converte cada chunk em um vetor armazenado no índice de busca. A avaliação fecha o ciclo — métricas como groundedness (a resposta é suportada pelo contexto recuperado?), relevância (ela responde à pergunta?) e coerência (está bem estruturada?) quantificam a qualidade do RAG para melhoria sistemática.

Arquitetura

O padrão avançado de RAG adiciona embeddings vetoriais, busca híbrida e ranking semântico para melhorar a qualidade da recuperação, com métricas de avaliação para medir a qualidade de ponta a ponta.

Challenge 15 topology

Pré-requisitos

  • Assinatura Azure com acesso ao Azure OpenAI
  • Serviço Azure AI Search (camada Basic ou superior para semantic ranker)
  • Implantação de GPT-4o e text-embedding-3-small
  • Python 3.9+ com pacotes openai, azure-search-documents e azure-identity
  • Documentos do Desafio 14 (ou novos dados de exemplo)

Implementação

Tarefa 1: Gerar Embeddings

import os
from openai import AzureOpenAI

endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_KEY"]

client = AzureOpenAI(
azure_endpoint=endpoint,
api_key=api_key,
api_version="2024-10-21"
)

# Sample documents to embed
documents = [
{
"id": "1",
"title": "Azure AI Foundry Overview",
"content": "Azure AI Foundry is a unified platform for building generative AI applications. It provides a hub-and-project architecture where hubs manage shared infrastructure including Storage, Key Vault, and Container Registry. Projects are workspaces where teams build and deploy AI solutions.",
"category": "platform"
},
{
"id": "2",
"title": "Azure OpenAI Model Deployment",
"content": "Azure OpenAI supports multiple deployment types: Standard uses shared compute with pay-per-token billing. Global Standard routes traffic globally for higher availability. Provisioned reserves dedicated compute capacity with guaranteed throughput measured in PTUs.",
"category": "models"
},
{
"id": "3",
"title": "Responsible AI and Content Filtering",
"content": "Microsoft's Responsible AI principles include fairness, reliability, privacy, inclusiveness, transparency, and accountability. Azure AI services include built-in content filters that detect and block harmful content in categories including hate, sexual, violence, and self-harm.",
"category": "governance"
},
{
"id": "4",
"title": "Azure AI Search Capabilities",
"content": "Azure AI Search provides full-text search, vector search, and hybrid search combining both. Semantic ranking uses deep learning to re-rank results by relevance. Skillsets enable AI enrichment during indexing including OCR, entity recognition, and custom skills.",
"category": "search"
},
{
"id": "5",
"title": "Vector Embeddings and Semantic Search",
"content": "Vector embeddings represent text as high-dimensional numerical arrays capturing semantic meaning. Similar concepts have vectors close together in embedding space. Text-embedding-3-small produces 1536-dimension vectors optimized for search and retrieval tasks.",
"category": "search"
}
]

# Generate embeddings for each document
for doc in documents:
response = client.embeddings.create(
model="text-embedding-3-small", # deployment name
input=doc["content"]
)
doc["content_vector"] = response.data[0].embedding
print(f"Embedded '{doc['title']}': {len(doc['content_vector'])} dimensions")

# Generate embedding for a query
query = "How do I deploy AI models with guaranteed performance?"
query_response = client.embeddings.create(
model="text-embedding-3-small",
input=query
)
query_vector = query_response.data[0].embedding
print(f"\nQuery embedded: '{query}' -> {len(query_vector)} dimensions")

Tarefa 2: Criar Índice Vetorial com Campos Híbridos

import os
from azure.identity import DefaultAzureCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SearchIndex,
SimpleField,
SearchableField,
SearchField,
SearchFieldDataType,
VectorSearch,
HnswAlgorithmConfiguration,
VectorSearchProfile,
SemanticConfiguration,
SemanticSearch,
SemanticPrioritizedFields,
SemanticField,
)

endpoint = os.environ["AZURE_SEARCH_ENDPOINT"]
credential = DefaultAzureCredential()

index_client = SearchIndexClient(endpoint=endpoint, credential=credential)

# Define index with vector field + keyword fields + semantic config
fields = [
SimpleField(name="id", type=SearchFieldDataType.String, key=True, filterable=True),
SearchableField(name="title", type=SearchFieldDataType.String, filterable=True),
SearchableField(name="content", type=SearchFieldDataType.String),
SimpleField(name="category", type=SearchFieldDataType.String, filterable=True, facetable=True),
# Vector field for embeddings
SearchField(
name="content_vector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=1536,
vector_search_profile_name="my-vector-profile"
),
]

# Configure vector search with HNSW algorithm
vector_search = VectorSearch(
algorithms=[
HnswAlgorithmConfiguration(
name="my-hnsw-config",
parameters={
"m": 4,
"efConstruction": 400,
"efSearch": 500,
"metric": "cosine"
}
)
],
profiles=[
VectorSearchProfile(
name="my-vector-profile",
algorithm_configuration_name="my-hnsw-config"
)
]
)

# Configure semantic ranking
semantic_config = SemanticConfiguration(
name="my-semantic-config",
prioritized_fields=SemanticPrioritizedFields(
title_field=SemanticField(field_name="title"),
content_fields=[SemanticField(field_name="content")]
)
)

semantic_search = SemanticSearch(configurations=[semantic_config])

# Create the index
index = SearchIndex(
name="ai102-vector-index",
fields=fields,
vector_search=vector_search,
semantic_search=semantic_search
)

result = index_client.create_or_update_index(index)
print(f"Vector index created: {result.name}")
print(f" Vector dimensions: 1536")
print(f" Algorithm: HNSW (cosine similarity)")
print(f" Semantic config: my-semantic-config")

Tarefa 3: Fazer Upload de Documentos com Vetores e Executar Busca Híbrida

import os
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery

# Initialize clients
openai_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
openai_key = os.environ["AZURE_OPENAI_KEY"]
search_endpoint = os.environ["AZURE_SEARCH_ENDPOINT"]

openai_client = AzureOpenAI(
azure_endpoint=openai_endpoint,
api_key=openai_key,
api_version="2024-10-21"
)

search_client = SearchClient(
endpoint=search_endpoint,
index_name="ai102-vector-index",
credential=DefaultAzureCredential()
)

# Documents with pre-computed embeddings
documents = [
{"id": "1", "title": "Azure AI Foundry Overview", "content": "Azure AI Foundry is a unified platform...", "category": "platform"},
{"id": "2", "title": "Azure OpenAI Deployment Types", "content": "Azure OpenAI supports Standard, Global Standard, and Provisioned deployment types...", "category": "models"},
{"id": "3", "title": "Responsible AI", "content": "Microsoft's Responsible AI principles include fairness, reliability, privacy...", "category": "governance"},
{"id": "4", "title": "Azure AI Search", "content": "Azure AI Search provides full-text, vector, and hybrid search...", "category": "search"},
{"id": "5", "title": "Vector Embeddings", "content": "Vector embeddings represent text as high-dimensional numerical arrays...", "category": "search"},
]

# Generate embeddings and upload
for doc in documents:
embedding_response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=doc["content"]
)
doc["content_vector"] = embedding_response.data[0].embedding

result = search_client.upload_documents(documents=documents)
print(f"Uploaded {len(result)} documents with vectors")

# --- Hybrid Search (keyword + vector) ---
query_text = "How do I get guaranteed model performance?"

# Generate query embedding
query_embedding = openai_client.embeddings.create(
model="text-embedding-3-small",
input=query_text
).data[0].embedding

# Execute hybrid search (combines keyword + vector via RRF)
results = search_client.search(
search_text=query_text, # Keyword component
vector_queries=[
VectorizedQuery(
vector=query_embedding,
k_nearest_neighbors=3,
fields="content_vector"
)
],
query_type="semantic", # Enable semantic ranking
semantic_configuration_name="my-semantic-config",
top=3
)

print(f"\nHybrid Search Results for: '{query_text}'")
print("-" * 60)
for result in results:
print(f" Score: {result['@search.score']:.4f} | "
f"Reranker: {result.get('@search.reranker_score', 'N/A')} | "
f"Title: {result['title']}")
print(f" Content: {result['content'][:100]}...")
print()

Tarefa 4: RAG com Busca Híbrida

import os
from openai import AzureOpenAI

endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_KEY"]
search_endpoint = os.environ["AZURE_SEARCH_ENDPOINT"]
search_key = os.environ["AZURE_SEARCH_KEY"]

client = AzureOpenAI(
azure_endpoint=endpoint,
api_key=api_key,
api_version="2024-10-21"
)

# RAG with hybrid search (vector + keyword + semantic ranking)
question = "How can I ensure consistent AI model performance for production workloads?"

response = client.chat.completions.create(
model="gpt-4o-standard",
messages=[
{"role": "system", "content": "You are an Azure AI expert. Answer based on the provided context. Cite your sources."},
{"role": "user", "content": question}
],
extra_body={
"data_sources": [
{
"type": "azure_search",
"parameters": {
"endpoint": search_endpoint,
"index_name": "ai102-vector-index",
"authentication": {
"type": "api_key",
"key": search_key
},
"query_type": "vector_semantic_hybrid",
"embedding_dependency": {
"type": "deployment_name",
"deployment_name": "text-embedding-3-small"
},
"semantic_configuration": "my-semantic-config",
"top_n_documents": 3,
"in_scope": True
}
}
]
}
)

print(f"Question: {question}")
print(f"\nAnswer (Hybrid RAG):")
print(response.choices[0].message.content)

Tarefa 5: Avaliar Qualidade do RAG

import os
from openai import AzureOpenAI

endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_KEY"]

client = AzureOpenAI(
azure_endpoint=endpoint,
api_key=api_key,
api_version="2024-10-21"
)

# Test cases for evaluation
test_cases = [
{
"question": "What deployment type guarantees throughput?",
"context": "Azure OpenAI supports Standard (shared compute), Global Standard (global routing), and Provisioned (dedicated compute with guaranteed throughput measured in PTUs).",
"answer": "Provisioned deployment type guarantees throughput by reserving dedicated compute capacity measured in Provisioned Throughput Units (PTUs).",
"ground_truth": "Provisioned deployments reserve dedicated compute capacity with guaranteed throughput."
},
{
"question": "How does hybrid search work?",
"context": "Azure AI Search provides full-text search, vector search, and hybrid search combining both. Results are fused using Reciprocal Rank Fusion (RRF).",
"answer": "Hybrid search combines keyword (full-text) search and vector search, fusing results using Reciprocal Rank Fusion (RRF) to leverage both exact matching and semantic similarity.",
"ground_truth": "Hybrid search combines keyword and vector search using RRF fusion."
}
]

# Evaluate: Groundedness, Relevance, Coherence
metrics = ["groundedness", "relevance", "coherence"]

evaluation_prompt = """You are an AI quality evaluator. Rate the following on a scale of 1-5:

Metric: {metric}
- Groundedness: Is the answer fully supported by the provided context? (1=fabricated, 5=fully supported)
- Relevance: Does the answer directly address the question? (1=irrelevant, 5=perfectly relevant)
- Coherence: Is the answer well-structured and easy to understand? (1=incoherent, 5=perfectly clear)

Question: {question}
Context: {context}
Answer: {answer}

Return ONLY a single number (1-5)."""

print("=" * 70)
print("RAG QUALITY EVALUATION")
print("=" * 70)

for i, test in enumerate(test_cases):
print(f"\nTest Case {i+1}: {test['question']}")
scores = {}

for metric in metrics:
response = client.chat.completions.create(
model="gpt-4o-standard",
messages=[
{"role": "user", "content": evaluation_prompt.format(
metric=metric,
question=test["question"],
context=test["context"],
answer=test["answer"]
)}
],
max_tokens=5,
temperature=0.0
)
score = response.choices[0].message.content.strip()
scores[metric] = score

print(f" Groundedness: {scores['groundedness']}/5")
print(f" Relevance: {scores['relevance']}/5")
print(f" Coherence: {scores['coherence']}/5")

print("\n" + "=" * 70)
print("EVALUATION COMPLETE")
print("Target: All metrics >= 4 for production readiness")

Saída Esperada

Após completar todas as tarefas, você deve ter:

  1. Embeddings vetoriais gerados usando text-embedding-3-small (1536 dimensões)
  2. Índice de busca vetorial ai102-vector-index com:
    • Configuração do algoritmo HNSW (similaridade por cosseno)
    • Configuração de ranking semântico
    • Campos de texto pesquisáveis e campo vetorial
  3. Resultados de busca híbrida combinando keyword, vetorial e ranking semântico
  4. Respostas RAG usando o tipo de consulta vector_semantic_hybrid
  5. Scores de avaliação para groundedness, relevância e coerência (alvo ≥ 4/5)

Quebra & conserta

CenárioSintomaCausa RaizCorreção
Incompatibilidade de dimensão vetorialInvalidVectorDimensionErrorÍndice espera 1536 mas embedding tem dimensões diferentesGaranta que a implantação do modelo de embedding corresponda ao campo dimensions do índice
Semantic ranker indisponívelSemanticSearchNotAvailableServiço de busca na camada FreeAtualize para a camada Basic ou superior para ranking semântico
Resultados vazios na busca vetorial0 hits apesar de documentos relevantesCampo vetorial não preenchido ou nome de campo errado na consultaVerifique se o campo content_vector tem dados; confira o parâmetro fields na consulta
Scores de avaliação baixosGroundedness < 3Chunks recuperados não relevantes; chunking muito grosseiroReduza o tamanho do chunk, adicione sobreposição ou aumente top_n_documents
Rate limit de embedding429 no endpoint de embeddingsMuitas requisições de embedding em loteAdicione delays entre lotes; implante com TPM maior

Verificação de Conhecimento

1. Qual vantagem a busca híbrida oferece sobre a busca vetorial pura ou a busca por palavras-chave pura?

2. O que o algoritmo HNSW na configuração vetorial do Azure AI Search controla?

3. Ao configurar 'On Your Data' com query_type 'vector_semantic_hybrid', quais três técnicas de busca são combinadas?

4. O que a métrica de avaliação 'groundedness' mede em um sistema RAG?

5. Qual é o propósito de chunks sobrepostos em uma estratégia de chunking para RAG?

Limpeza

az group delete --name rg-ai102-challenge15 --yes --no-wait

Saiba Mais