Desafio 16: Azure OpenAI: Provisionamento e Configuração
45-60 min | Custo: ~$1.00 (estimado) | Domínio: Generative AI Solutions (15-20%)
Habilidades do exame cobertas
- Provisionar um recurso Azure OpenAI
- Selecionar e implantar um modelo Azure OpenAI
- Configurar limites de taxa e gerenciar tipos de implantação
Visão Geral
O Azure OpenAI Service fornece acesso via REST API aos poderosos modelos de linguagem da OpenAI, incluindo GPT-4o, GPT-4o-mini e modelos de embedding. O provisionamento requer a seleção do SKU apropriado (S0 para consumo padrão) e o entendimento das opções de implantação disponíveis: Standard (infraestrutura compartilhada, pagamento por token), Global Standard (roteamento otimizado entre regiões) e Provisioned Throughput Units (PTU) para capacidade garantida.
Cada implantação está sujeita a limites de taxa medidos em Tokens Per Minute (TPM) e Requests Per Minute (RPM). Quando os limites são excedidos, o serviço retorna respostas HTTP 429 com headers Retry-After. Aplicações em produção devem implementar estratégias de retry com exponential backoff para lidar com o throttling de forma elegante.
As versões da API seguem o formato YYYY-MM-DD com sufixos preview para recursos pré-GA. As aplicações devem usar versões estáveis da API (ex.: 2024-10-21) e planejar a aposentadoria de versões, que é anunciada com pelo menos 90 dias de antecedência.
Arquitetura
Este desafio provisiona um recurso Azure OpenAI, implanta modelos com configurações específicas de capacidade e testa o comportamento de rate-limiting e estratégias de retry.
Pré-requisitos
- Assinatura Azure com acesso ao Azure OpenAI aprovado
- Azure CLI 2.60+ instalado
- Python 3.9+ com pacotes
openaieazure-identity - .NET 8 SDK com pacote NuGet
Azure.AI.OpenAI
Implementação
Tarefa 1: Provisionar Recurso Azure OpenAI
Crie um recurso Azure OpenAI com o SKU S0 em uma região suportada.
- Python SDK
- C# SDK
- REST API
# Provisioning is done via Azure CLI or ARM—use the resource with Python SDK
import os
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
# Option 1: API Key authentication
client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_KEY"],
api_version="2024-10-21"
)
# Option 2: Microsoft Entra ID authentication (recommended)
token_provider = get_bearer_token_provider(
DefaultAzureCredential(),
"https://cognitiveservices.azure.com/.default"
)
client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
azure_ad_token_provider=token_provider,
api_version="2024-10-21"
)
# Verify connectivity
response = client.chat.completions.create(
model="gpt-4o", # This is the deployment name
messages=[{"role": "user", "content": "Hello, confirm connection."}],
max_tokens=10
)
print(f"Connected successfully: {response.choices[0].message.content}")
using Azure;
using Azure.AI.OpenAI;
using Azure.Identity;
using OpenAI.Chat;
// Option 1: API Key authentication
string endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!;
string apiKey = Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")!;
AzureOpenAIClient azureClient = new(
new Uri(endpoint),
new AzureKeyCredential(apiKey));
// Option 2: Microsoft Entra ID authentication (recommended)
AzureOpenAIClient azureClientEntra = new(
new Uri(endpoint),
new DefaultAzureCredential());
// Get a ChatClient for a specific deployment
ChatClient chatClient = azureClient.GetChatClient("gpt-4o");
// Verify connectivity
ChatCompletion completion = await chatClient.CompleteChatAsync(
new ChatMessage[] { new UserChatMessage("Hello, confirm connection.") },
new ChatCompletionOptions { MaxOutputTokenCount = 10 });
Console.WriteLine($"Connected successfully: {completion.Content[0].Text}");
# Create resource group
az group create --name rg-ai102-challenge16 --location eastus2
# Create Azure OpenAI resource (S0 SKU)
az cognitiveservices account create \
--name aoai-challenge16 \
--resource-group rg-ai102-challenge16 \
--location eastus2 \
--kind OpenAI \
--sku S0 \
--custom-domain aoai-challenge16
# Get the endpoint and keys
az cognitiveservices account show \
--name aoai-challenge16 \
--resource-group rg-ai102-challenge16 \
--query properties.endpoint -o tsv
az cognitiveservices account keys list \
--name aoai-challenge16 \
--resource-group rg-ai102-challenge16
# Verify with a direct REST call
curl -X POST "https://aoai-challenge16.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-10-21" \
-H "Content-Type: application/json" \
-H "api-key: ${AZURE_OPENAI_KEY}" \
-d '{
"messages": [{"role": "user", "content": "Hello, confirm connection."}],
"max_tokens": 10
}'
Tarefa 2: Implantar GPT-4o com Capacidade Específica
Implante um modelo GPT-4o com tipo de implantação Standard e configure a capacidade de TPM.
- Python SDK
- C# SDK
- REST API
# Model deployment is managed via Azure CLI or REST management API
# After deployment, test with the Python SDK
import os
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_KEY"],
api_version="2024-10-21"
)
# Test the deployed model
response = client.chat.completions.create(
model="gpt-4o", # deployment name
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain Azure OpenAI deployment types in one sentence."}
],
max_tokens=100
)
print(f"Response: {response.choices[0].message.content}")
print(f"Tokens used - Prompt: {response.usage.prompt_tokens}, "
f"Completion: {response.usage.completion_tokens}")
using Azure;
using Azure.AI.OpenAI;
using OpenAI.Chat;
string endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!;
string apiKey = Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")!;
AzureOpenAIClient azureClient = new(
new Uri(endpoint),
new AzureKeyCredential(apiKey));
ChatClient chatClient = azureClient.GetChatClient("gpt-4o");
ChatCompletion completion = await chatClient.CompleteChatAsync(
new ChatMessage[]
{
new SystemChatMessage("You are a helpful assistant."),
new UserChatMessage("Explain Azure OpenAI deployment types in one sentence.")
},
new ChatCompletionOptions { MaxOutputTokenCount = 100 });
Console.WriteLine($"Response: {completion.Content[0].Text}");
Console.WriteLine($"Tokens used - Prompt: {completion.Usage.InputTokenCount}, "
+ $"Completion: {completion.Usage.OutputTokenCount}");
# Deploy GPT-4o with Standard deployment type and 30K TPM capacity
az cognitiveservices account deployment create \
--name aoai-challenge16 \
--resource-group rg-ai102-challenge16 \
--deployment-name gpt-4o \
--model-name gpt-4o \
--model-version "2024-08-06" \
--model-format OpenAI \
--sku-name "Standard" \
--sku-capacity 30
# Deploy GPT-4o-mini for cost-efficient workloads
az cognitiveservices account deployment create \
--name aoai-challenge16 \
--resource-group rg-ai102-challenge16 \
--deployment-name gpt-4o-mini \
--model-name gpt-4o-mini \
--model-version "2024-07-18" \
--model-format OpenAI \
--sku-name "GlobalStandard" \
--sku-capacity 50
# List deployments to verify
az cognitiveservices account deployment list \
--name aoai-challenge16 \
--resource-group rg-ai102-challenge16 \
-o table
Tarefa 3: Testar Limites de Taxa e Implementar Exponential Backoff
Envie requisições para observar o comportamento de rate limiting e implemente a lógica de retry adequada.
- Python SDK
- C# SDK
- REST API
import os
import time
from openai import AzureOpenAI, RateLimitError
client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_KEY"],
api_version="2024-10-21"
)
def call_with_exponential_backoff(messages, max_retries=5, base_delay=1.0):
"""Implement exponential backoff for rate-limited requests."""
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
max_tokens=50
)
return response
except RateLimitError as e:
if attempt == max_retries - 1:
raise
# Use Retry-After header if available, otherwise exponential backoff
retry_after = getattr(e, "retry_after", None)
delay = retry_after if retry_after else base_delay * (2 ** attempt)
print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1})")
time.sleep(delay)
# Simulate high-volume requests to trigger rate limiting
results = []
for i in range(20):
try:
response = call_with_exponential_backoff(
[{"role": "user", "content": f"Say the number {i}"}]
)
results.append(response.choices[0].message.content)
print(f"Request {i}: Success")
except RateLimitError:
print(f"Request {i}: Exhausted retries")
print(f"\nCompleted {len(results)}/20 requests")
using Azure;
using Azure.AI.OpenAI;
using OpenAI.Chat;
using System.ClientModel;
string endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!;
string apiKey = Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")!;
AzureOpenAIClient azureClient = new(
new Uri(endpoint),
new AzureKeyCredential(apiKey));
ChatClient chatClient = azureClient.GetChatClient("gpt-4o");
async Task<ChatCompletion?> CallWithExponentialBackoff(
ChatMessage[] messages, int maxRetries = 5, double baseDelay = 1.0)
{
for (int attempt = 0; attempt < maxRetries; attempt++)
{
try
{
return await chatClient.CompleteChatAsync(
messages,
new ChatCompletionOptions { MaxOutputTokenCount = 50 });
}
catch (ClientResultException ex) when (ex.Status == 429)
{
if (attempt == maxRetries - 1) throw;
double delay = baseDelay * Math.Pow(2, attempt);
Console.WriteLine(
$"Rate limited. Retrying in {delay:F1}s (attempt {attempt + 1})");
await Task.Delay(TimeSpan.FromSeconds(delay));
}
}
return null;
}
// Simulate high-volume requests
int successCount = 0;
for (int i = 0; i < 20; i++)
{
try
{
var result = await CallWithExponentialBackoff(
new ChatMessage[] { new UserChatMessage($"Say the number {i}") });
if (result != null)
{
successCount++;
Console.WriteLine($"Request {i}: Success");
}
}
catch (ClientResultException)
{
Console.WriteLine($"Request {i}: Exhausted retries");
}
}
Console.WriteLine($"\nCompleted {successCount}/20 requests");
# Send rapid requests to observe rate limiting (429 responses)
for i in $(seq 1 20); do
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
-X POST "https://aoai-challenge16.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-10-21" \
-H "Content-Type: application/json" \
-H "api-key: ${AZURE_OPENAI_KEY}" \
-d "{\"messages\": [{\"role\": \"user\", \"content\": \"Say ${i}\"}], \"max_tokens\": 10}")
echo "Request $i: HTTP $HTTP_CODE"
done
# Check rate limit headers in response
curl -i -X POST "https://aoai-challenge16.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-10-21" \
-H "Content-Type: application/json" \
-H "api-key: ${AZURE_OPENAI_KEY}" \
-d '{
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 10
}' 2>/dev/null | grep -i "x-ratelimit\|retry-after"
# Headers to observe:
# x-ratelimit-remaining-tokens
# x-ratelimit-remaining-requests
# Retry-After (when 429)
Tarefa 4: Comparar Implantações Standard vs Global Standard
- Python SDK
- C# SDK
- REST API
import os
import time
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_KEY"],
api_version="2024-10-21"
)
def measure_latency(deployment_name, num_requests=5):
"""Measure average latency for a deployment."""
latencies = []
for _ in range(num_requests):
start = time.time()
response = client.chat.completions.create(
model=deployment_name,
messages=[{"role": "user", "content": "Respond with OK."}],
max_tokens=5
)
latencies.append(time.time() - start)
return {
"deployment": deployment_name,
"avg_latency_ms": sum(latencies) / len(latencies) * 1000,
"min_latency_ms": min(latencies) * 1000,
"max_latency_ms": max(latencies) * 1000
}
# Compare Standard vs Global Standard deployments
standard_results = measure_latency("gpt-4o") # Standard deployment
global_results = measure_latency("gpt-4o-mini") # Global Standard deployment
print("Standard Deployment:")
print(f" Avg: {standard_results['avg_latency_ms']:.0f}ms | "
f"Min: {standard_results['min_latency_ms']:.0f}ms | "
f"Max: {standard_results['max_latency_ms']:.0f}ms")
print("\nGlobal Standard Deployment:")
print(f" Avg: {global_results['avg_latency_ms']:.0f}ms | "
f"Min: {global_results['min_latency_ms']:.0f}ms | "
f"Max: {global_results['max_latency_ms']:.0f}ms")
using Azure;
using Azure.AI.OpenAI;
using OpenAI.Chat;
using System.Diagnostics;
string endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!;
string apiKey = Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")!;
AzureOpenAIClient azureClient = new(
new Uri(endpoint),
new AzureKeyCredential(apiKey));
async Task<(double avg, double min, double max)> MeasureLatency(
string deploymentName, int numRequests = 5)
{
ChatClient chatClient = azureClient.GetChatClient(deploymentName);
var latencies = new List<double>();
for (int i = 0; i < numRequests; i++)
{
var sw = Stopwatch.StartNew();
await chatClient.CompleteChatAsync(
new ChatMessage[] { new UserChatMessage("Respond with OK.") },
new ChatCompletionOptions { MaxOutputTokenCount = 5 });
sw.Stop();
latencies.Add(sw.Elapsed.TotalMilliseconds);
}
return (latencies.Average(), latencies.Min(), latencies.Max());
}
var standard = await MeasureLatency("gpt-4o");
var global = await MeasureLatency("gpt-4o-mini");
Console.WriteLine($"Standard: Avg={standard.avg:F0}ms Min={standard.min:F0}ms Max={standard.max:F0}ms");
Console.WriteLine($"Global Standard: Avg={global.avg:F0}ms Min={global.min:F0}ms Max={global.max:F0}ms");
# Compare latencies between deployment types
echo "=== Standard Deployment (gpt-4o) ==="
for i in $(seq 1 5); do
START=$(date +%s%N)
curl -s -o /dev/null \
-X POST "https://aoai-challenge16.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-10-21" \
-H "Content-Type: application/json" \
-H "api-key: ${AZURE_OPENAI_KEY}" \
-d '{"messages": [{"role": "user", "content": "OK"}], "max_tokens": 5}'
END=$(date +%s%N)
echo "Request $i: $(( (END - START) / 1000000 ))ms"
done
echo ""
echo "=== Global Standard Deployment (gpt-4o-mini) ==="
for i in $(seq 1 5); do
START=$(date +%s%N)
curl -s -o /dev/null \
-X POST "https://aoai-challenge16.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2024-10-21" \
-H "Content-Type: application/json" \
-H "api-key: ${AZURE_OPENAI_KEY}" \
-d '{"messages": [{"role": "user", "content": "OK"}], "max_tokens": 5}'
END=$(date +%s%N)
echo "Request $i: $(( (END - START) / 1000000 ))ms"
done
Saída Esperada
Connected successfully: Hello! Connection confirmed.
Response: Standard uses shared compute with pay-per-token, Global Standard optimizes
routing across regions, and Provisioned (PTU) guarantees dedicated throughput capacity.
Tokens used - Prompt: 22, Completion: 31
Rate limited. Retrying in 1.0s (attempt 1)
Request 0: Success
...
Completed 18/20 requests
Standard Deployment:
Avg: 450ms | Min: 320ms | Max: 680ms
Global Standard Deployment:
Avg: 380ms | Min: 280ms | Max: 520ms
Quebra & conserta
| Cenário | Sintoma | Causa Raiz | Correção |
|---|---|---|---|
| Criação do recurso falha | Erro InvalidApiProperties | Região não suporta Azure OpenAI | Use uma região suportada (eastus, eastus2, westus, etc.) |
| Implantação falha | ModelNotAvailable | Modelo não disponível na região selecionada | Verifique a matriz de disponibilidade de modelos ou mude a região |
| API retorna 401 | Access denied due to invalid subscription key | Chave incorreta ou endpoint incompatível | Verifique se a chave corresponde ao recurso; confira a URL do endpoint |
| API retorna 429 | Rate limit is exceeded | Limite de TPM ou RPM excedido | Implemente exponential backoff; aumente a capacidade |
| API retorna 404 | Resource not found | Nome da implantação errado na requisição | Verifique se o nome da implantação está exatamente correto |
Verificação de Conhecimento
1. Qual SKU é necessário ao criar um recurso Azure OpenAI via Azure CLI?
2. Qual tipo de implantação oferece capacidade de throughput garantida com custo mensal fixo?
3. Quando o Azure OpenAI retorna HTTP 429, qual header indica quanto tempo esperar antes de tentar novamente?
4. Qual é a unidade de capacidade para implantações Standard ao configurar limites de taxa?
5. Qual formato de versão da API o Azure OpenAI usa, e o que acontece quando uma versão é aposentada?
Limpeza
az group delete --name rg-ai102-challenge16 --yes --no-wait