Desafio 12: Implantar Modelos de IA Generativa

Tempo Estimado

45-60 min | Custo: ~$1.00 (estimado) | Domínio: Soluções de IA Generativa (15-20%)

Habilidades do exame cobertas

Implantar modelos de IA generativa apropriados para casos de uso específicos
Configurar parâmetros de implantação de modelos incluindo cotas e limites de taxa
Comparar tipos de implantação (Standard, Global Standard, Provisioned)

Visão Geral

O Azure OpenAI Service fornece acesso a uma variedade de modelos de IA generativa através de um modelo de implantação gerenciado. Escolher o modelo e tipo de implantação corretos é uma habilidade crítica para o exame AI-102. O catálogo de modelos inclui GPT-4o (multimodal, alta capacidade), GPT-4o-mini (custo-eficiente para tarefas mais simples) e modelos open-source como Phi-4, Mistral e Llama disponíveis através de Models as a Service (MaaS).

Tipos de implantação determinam como seu modelo é hospedado e cobrado. Implantações Standard usam computação compartilhada com cobrança por token e estão sujeitas a cotas de Tokens-Per-Minute (TPM) e Requests-Per-Minute (RPM). Implantações Global Standard roteiam tráfego globalmente para maior disponibilidade e throughput. Implantações Provisioned (PTU) reservam capacidade de computação dedicada, fornecendo throughput garantido para cargas de trabalho de produção com custos previsíveis.

Compreender cotas é essencial—cada assinatura tem limites de TPM por modelo por região. Quando você implanta um modelo, aloca uma porção da sua cota disponível. O rate limiting (HTTP 429) ocorre quando requisições excedem o TPM/RPM alocado. Monitorar o uso de cotas e planejar capacidade entre implantações é uma habilidade operacional fundamental.

Arquitetura

A arquitetura de implantação conecta sua aplicação aos endpoints do Azure OpenAI através de implantações de modelos configuradas com SKUs específicos e alocações de cota.

Topologia do Desafio 12

Pré-requisitos

Assinatura Azure com acesso ao Azure OpenAI aprovado
Azure CLI com extensão cognitiveservices
Um recurso Azure OpenAI existente (ou permissões para criar um)
Cota suficiente na região alvo para GPT-4o e GPT-4o-mini

Implementação

Tarefa 1: Listar Modelos Disponíveis e Implantar GPT-4o

Python SDK
C# SDK
REST API

import os
from azure.identity import DefaultAzureCredential
from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient

credential = DefaultAzureCredential()
subscription_id = "YOUR_SUBSCRIPTION_ID"
resource_group = "rg-ai102-challenge12"
account_name = "aoai-ai102-challenge12"

client = CognitiveServicesManagementClient(credential, subscription_id)

# List available models for the account
models = client.accounts.list_models(
    resource_group_name=resource_group,
    account_name=account_name
)
print("Available models:")
for model in models:
    print(f"  {model.model.name} ({model.model.version}) - {model.model.format}")

# Create GPT-4o deployment (Standard)
from azure.mgmt.cognitiveservices.models import Deployment, DeploymentModel, Sku

deployment = Deployment(
    sku=Sku(name="Standard", capacity=30),  # 30K TPM
    properties={
        "model": DeploymentModel(
            format="OpenAI",
            name="gpt-4o",
            version="2024-08-06"
        )
    }
)

poller = client.deployments.begin_create_or_update(
    resource_group_name=resource_group,
    account_name=account_name,
    deployment_name="gpt-4o-standard",
    deployment=deployment
)
result = poller.result()
print(f"\nDeployed: {result.name}")
print(f"  Model: {result.properties.model.name} v{result.properties.model.version}")
print(f"  SKU: {result.sku.name} ({result.sku.capacity}K TPM)")

using Azure.Identity;
using Azure.ResourceManager;
using Azure.ResourceManager.CognitiveServices;
using Azure.ResourceManager.CognitiveServices.Models;

var credential = new DefaultAzureCredential();
var client = new ArmClient(credential);

string subscriptionId = "YOUR_SUBSCRIPTION_ID";
string resourceGroup = "rg-ai102-challenge12";
string accountName = "aoai-ai102-challenge12";

var accountId = CognitiveServicesAccountResource.CreateResourceIdentifier(
    subscriptionId, resourceGroup, accountName);
var account = client.GetCognitiveServicesAccountResource(accountId);

// List available models
var models = account.GetModelsAsync();
await foreach (var model in models)
{
    Console.WriteLine($"  {model.Model.Name} ({model.Model.Version})");
}

// Create GPT-4o deployment
var deployments = account.GetCognitiveServicesAccountDeployments();
var deploymentData = new CognitiveServicesAccountDeploymentData
{
    Sku = new CognitiveServicesSku("Standard") { Capacity = 30 },
    Properties = new CognitiveServicesAccountDeploymentProperties
    {
        Model = new CognitiveServicesAccountDeploymentModel
        {
            Format = "OpenAI",
            Name = "gpt-4o",
            Version = "2024-08-06"
        }
    }
};

var operation = await deployments.CreateOrUpdateAsync(
    Azure.WaitUntil.Completed, "gpt-4o-standard", deploymentData);

Console.WriteLine($"Deployed: {operation.Value.Data.Name}");
Console.WriteLine($"  Capacity: {operation.Value.Data.Sku.Capacity}K TPM");

SUBSCRIPTION_ID="YOUR_SUBSCRIPTION_ID"
RESOURCE_GROUP="rg-ai102-challenge12"
ACCOUNT_NAME="aoai-ai102-challenge12"
LOCATION="eastus2"

# Create resource group and OpenAI account
az group create --name $RESOURCE_GROUP --location $LOCATION

az cognitiveservices account create \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --location $LOCATION \
  --kind OpenAI \
  --sku S0

# List available models
az cognitiveservices account list-models \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --output table

# Deploy GPT-4o (Standard, 30K TPM)
az cognitiveservices account deployment create \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --deployment-name "gpt-4o-standard" \
  --model-name "gpt-4o" \
  --model-version "2024-08-06" \
  --model-format "OpenAI" \
  --sku-name "Standard" \
  --sku-capacity 30

Tarefa 2: Implantar GPT-4o-mini para Comparação de Custos

Python SDK
C# SDK
REST API

from azure.identity import DefaultAzureCredential
from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient
from azure.mgmt.cognitiveservices.models import Deployment, DeploymentModel, Sku

credential = DefaultAzureCredential()
subscription_id = "YOUR_SUBSCRIPTION_ID"
resource_group = "rg-ai102-challenge12"
account_name = "aoai-ai102-challenge12"

client = CognitiveServicesManagementClient(credential, subscription_id)

# Deploy GPT-4o-mini (Global Standard for higher throughput)
deployment_mini = Deployment(
    sku=Sku(name="GlobalStandard", capacity=50),  # 50K TPM
    properties={
        "model": DeploymentModel(
            format="OpenAI",
            name="gpt-4o-mini",
            version="2024-07-18"
        )
    }
)

poller = client.deployments.begin_create_or_update(
    resource_group_name=resource_group,
    account_name=account_name,
    deployment_name="gpt-4o-mini-global",
    deployment=deployment_mini
)
result = poller.result()
print(f"Deployed: {result.name}")
print(f"  Model: {result.properties.model.name}")
print(f"  SKU: {result.sku.name} ({result.sku.capacity}K TPM)")

# Compare deployments
deployments = client.deployments.list(
    resource_group_name=resource_group,
    account_name=account_name
)
print("\n--- Deployment Comparison ---")
print(f"{'Name':<25} {'Model':<15} {'SKU':<18} {'TPM':<8}")
print("-" * 70)
for d in deployments:
    print(f"{d.name:<25} {d.properties.model.name:<15} {d.sku.name:<18} {d.sku.capacity}K")

using Azure.Identity;
using Azure.ResourceManager;
using Azure.ResourceManager.CognitiveServices;
using Azure.ResourceManager.CognitiveServices.Models;

var credential = new DefaultAzureCredential();
var client = new ArmClient(credential);

string subscriptionId = "YOUR_SUBSCRIPTION_ID";
string resourceGroup = "rg-ai102-challenge12";
string accountName = "aoai-ai102-challenge12";

var accountId = CognitiveServicesAccountResource.CreateResourceIdentifier(
    subscriptionId, resourceGroup, accountName);
var account = client.GetCognitiveServicesAccountResource(accountId);
var deployments = account.GetCognitiveServicesAccountDeployments();

// Deploy GPT-4o-mini with Global Standard SKU
var miniDeployment = new CognitiveServicesAccountDeploymentData
{
    Sku = new CognitiveServicesSku("GlobalStandard") { Capacity = 50 },
    Properties = new CognitiveServicesAccountDeploymentProperties
    {
        Model = new CognitiveServicesAccountDeploymentModel
        {
            Format = "OpenAI",
            Name = "gpt-4o-mini",
            Version = "2024-07-18"
        }
    }
};

var operation = await deployments.CreateOrUpdateAsync(
    Azure.WaitUntil.Completed, "gpt-4o-mini-global", miniDeployment);
Console.WriteLine($"Deployed: {operation.Value.Data.Name}");

// List all deployments for comparison
Console.WriteLine("\n--- Deployment Comparison ---");
await foreach (var d in deployments.GetAllAsync())
{
    Console.WriteLine($"{d.Data.Name,-25} {d.Data.Properties.Model.Name,-15} " +
        $"{d.Data.Sku.Name,-18} {d.Data.Sku.Capacity}K TPM");
}

# Deploy GPT-4o-mini with Global Standard
az cognitiveservices account deployment create \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --deployment-name "gpt-4o-mini-global" \
  --model-name "gpt-4o-mini" \
  --model-version "2024-07-18" \
  --model-format "OpenAI" \
  --sku-name "GlobalStandard" \
  --sku-capacity 50

# List all deployments
az cognitiveservices account deployment list \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --output table

Tarefa 3: Verificar Uso de Cota

Python SDK
C# SDK
REST API

from azure.identity import DefaultAzureCredential
from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient

credential = DefaultAzureCredential()
subscription_id = "YOUR_SUBSCRIPTION_ID"
resource_group = "rg-ai102-challenge12"
account_name = "aoai-ai102-challenge12"
location = "eastus2"

client = CognitiveServicesManagementClient(credential, subscription_id)

# Check model quota/usage for the subscription in this region
usages = client.usages.list(location=location)
print(f"Quota usage for {location}:")
print(f"{'Model':<30} {'Used':<10} {'Limit':<10} {'Unit':<10}")
print("-" * 60)
for usage in usages:
    if usage.current_value > 0 or "OpenAI" in (usage.name.value or ""):
        print(f"{usage.name.localized_value:<30} "
              f"{usage.current_value:<10} "
              f"{usage.limit:<10} "
              f"{usage.unit:<10}")

# Check deployment-level rate limits
deployments = client.deployments.list(
    resource_group_name=resource_group,
    account_name=account_name
)
print("\n--- Rate Limits per Deployment ---")
for d in deployments:
    tpm = d.sku.capacity
    # RPM is typically 6x TPM in thousands for standard
    estimated_rpm = tpm * 6
    print(f"{d.name}: {tpm}K TPM, ~{estimated_rpm} RPM")

using Azure.Identity;
using Azure.ResourceManager;
using Azure.ResourceManager.CognitiveServices;

var credential = new DefaultAzureCredential();
var client = new ArmClient(credential);

string subscriptionId = "YOUR_SUBSCRIPTION_ID";
string resourceGroup = "rg-ai102-challenge12";
string accountName = "aoai-ai102-challenge12";

var subscription = await client.GetDefaultSubscriptionAsync();

// Check usages for the account
var accountId = CognitiveServicesAccountResource.CreateResourceIdentifier(
    subscriptionId, resourceGroup, accountName);
var account = client.GetCognitiveServicesAccountResource(accountId);

var usages = account.GetUsagesAsync();
Console.WriteLine("Account Usage:");
await foreach (var usage in usages)
{
    Console.WriteLine($"  {usage.Name?.LocalizedValue}: " +
        $"{usage.CurrentValue}/{usage.Limit} ({usage.Unit})");
}

// List deployments with capacity info
var deployments = account.GetCognitiveServicesAccountDeployments();
Console.WriteLine("\n--- Rate Limits per Deployment ---");
await foreach (var d in deployments.GetAllAsync())
{
    var tpm = d.Data.Sku.Capacity;
    Console.WriteLine($"  {d.Data.Name}: {tpm}K TPM");
}

# Check quota usage for a specific model in your region
az cognitiveservices usage list \
  --location $LOCATION \
  --output table

# Show deployment details including capacity
az cognitiveservices account deployment show \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --deployment-name "gpt-4o-standard" \
  --query "{name:name, model:properties.model.name, sku:sku.name, capacity:sku.capacity}"

# REST API - check quota
TOKEN=$(az account get-access-token --query accessToken -o tsv)

curl -s \
  "https://management.azure.com/subscriptions/${SUBSCRIPTION_ID}/providers/Microsoft.CognitiveServices/locations/${LOCATION}/usages?api-version=2024-04-01-preview" \
  -H "Authorization: Bearer $TOKEN" | jq '.value[] | select(.currentValue > 0)'

Tarefa 4: Testar as Implantações

Python SDK
C# SDK
REST API

import os
from openai import AzureOpenAI

endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_KEY"]

client = AzureOpenAI(
    azure_endpoint=endpoint,
    api_key=api_key,
    api_version="2024-10-21"
)

test_prompt = "Explain the difference between GPT-4o and GPT-4o-mini in 2 sentences."

# Test GPT-4o
response_4o = client.chat.completions.create(
    model="gpt-4o-standard",
    messages=[{"role": "user", "content": test_prompt}],
    max_tokens=150
)
print(f"GPT-4o response:")
print(f"  {response_4o.choices[0].message.content}")
print(f"  Tokens: {response_4o.usage.total_tokens}")

# Test GPT-4o-mini
response_mini = client.chat.completions.create(
    model="gpt-4o-mini-global",
    messages=[{"role": "user", "content": test_prompt}],
    max_tokens=150
)
print(f"\nGPT-4o-mini response:")
print(f"  {response_mini.choices[0].message.content}")
print(f"  Tokens: {response_mini.usage.total_tokens}")

# Cost comparison (approximate pricing)
print("\n--- Cost Comparison (approximate) ---")
print(f"GPT-4o:      Input ${5.00}/1M tokens, Output ${15.00}/1M tokens")
print(f"GPT-4o-mini: Input ${0.15}/1M tokens, Output ${0.60}/1M tokens")

using Azure;
using Azure.AI.OpenAI;

string endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!;
string apiKey = Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")!;

var client = new AzureOpenAIClient(
    new Uri(endpoint), new AzureKeyCredential(apiKey));

string testPrompt = "Explain the difference between GPT-4o and GPT-4o-mini in 2 sentences.";

// Test GPT-4o
var chatClient4o = client.GetChatClient("gpt-4o-standard");
var response4o = await chatClient4o.CompleteChatAsync(
    new[] { new Azure.AI.OpenAI.Chat.UserChatMessage(testPrompt) });

Console.WriteLine("GPT-4o response:");
Console.WriteLine($"  {response4o.Value.Content[0].Text}");
Console.WriteLine($"  Tokens: {response4o.Value.Usage.TotalTokenCount}");

// Test GPT-4o-mini
var chatClientMini = client.GetChatClient("gpt-4o-mini-global");
var responseMini = await chatClientMini.CompleteChatAsync(
    new[] { new Azure.AI.OpenAI.Chat.UserChatMessage(testPrompt) });

Console.WriteLine("\nGPT-4o-mini response:");
Console.WriteLine($"  {responseMini.Value.Content[0].Text}");
Console.WriteLine($"  Tokens: {responseMini.Value.Usage.TotalTokenCount}");

AZURE_OPENAI_ENDPOINT="https://aoai-ai102-challenge12.openai.azure.com"
AZURE_OPENAI_KEY="YOUR_KEY"

# Test GPT-4o deployment
curl -s "${AZURE_OPENAI_ENDPOINT}/openai/deployments/gpt-4o-standard/chat/completions?api-version=2024-10-21" \
  -H "Content-Type: application/json" \
  -H "api-key: ${AZURE_OPENAI_KEY}" \
  -d '{
    "messages": [{"role": "user", "content": "Explain GPT-4o vs GPT-4o-mini in 2 sentences."}],
    "max_tokens": 150
  }' | jq '{content: .choices[0].message.content, tokens: .usage.total_tokens}'

# Test GPT-4o-mini deployment
curl -s "${AZURE_OPENAI_ENDPOINT}/openai/deployments/gpt-4o-mini-global/chat/completions?api-version=2024-10-21" \
  -H "Content-Type: application/json" \
  -H "api-key: ${AZURE_OPENAI_KEY}" \
  -d '{
    "messages": [{"role": "user", "content": "Explain GPT-4o vs GPT-4o-mini in 2 sentences."}],
    "max_tokens": 150
  }' | jq '{content: .choices[0].message.content, tokens: .usage.total_tokens}'

Saída Esperada

Após completar todas as tarefas, você deve ter:

Recurso Azure OpenAI aoai-ai102-challenge12 com duas implantações:
- gpt-4o-standard — SKU Standard, 30K TPM, versão do modelo 2024-08-06
- gpt-4o-mini-global — SKU GlobalStandard, 50K TPM, versão do modelo 2024-07-18
Cota consumida: 30K TPM da cota do GPT-4o, 50K TPM da cota do GPT-4o-mini
Respostas de teste bem-sucedidas de ambas as implantações mostrando estilos de resposta diferentes

Quebra & conserta

Cenário	Sintoma	Causa Raiz	Correção
Implantação falha	Erro `QuotaExceeded`	Cota de TPM insuficiente na região	Reduza a capacidade ou solicite aumento de cota pelo Portal do Azure
Modelo não encontrado	`ModelNotFound` ou lista de modelos vazia	Modelo não disponível na região selecionada	Verifique disponibilidade regional; tente `eastus2` ou `swedencentral`
429 Too Many Requests	Erros de rate limit durante testes	Requisições excedem TPM/RPM alocado	Implemente exponential backoff; aumente a capacidade da implantação
Versão errada do modelo	`InvalidModelVersion`	Versão especificada descontinuada ou ainda não disponível	Use `az cognitiveservices account list-models` para encontrar versões válidas
Global Standard indisponível	SKU não suportado	Nem todos os modelos suportam Global Standard	Use SKU Standard ou verifique a documentação de compatibilidade modelo-SKU

Verificação de Conhecimento

1. Qual é a diferença principal entre os tipos de implantação Standard e Provisioned?

2. Ao implantar um modelo, o que o parâmetro 'capacity' no SKU representa?

3. Qual modelo seria mais custo-eficiente para uma tarefa de classificação de alto volume que não requer raciocínio avançado?

4. O que acontece quando o limite de taxa (TPM/RPM) de uma implantação é excedido?

5. Qual comando do Azure CLI implanta um modelo GPT-4o em um recurso Azure OpenAI?

Limpeza

az group delete --name rg-ai102-challenge12 --yes --no-wait

Habilidades do exame cobertas​

Visão Geral​

Arquitetura​

Pré-requisitos​

Implementação​

Tarefa 1: Listar Modelos Disponíveis e Implantar GPT-4o​

Tarefa 2: Implantar GPT-4o-mini para Comparação de Custos​

Tarefa 3: Verificar Uso de Cota​

Tarefa 4: Testar as Implantações​

Saída Esperada​

Quebra & conserta​

Verificação de Conhecimento​

Limpeza​

Saiba Mais​