Challenge 03: Deploy AI Models
60 min | Cost: ~$1.00 | Domain: Plan & Manage AI Solutions (20-25%)
Exam skills covered
- Deploy AI models using appropriate deployment options
- Plan capacity for model deployments (tokens per minute, requests per minute)
- Manage model versions and lifecycle
- Choose between Standard, Global Standard, and Provisioned Throughput deployments
Overview
Azure OpenAI Service requires explicit model deployment before you can make inference calls. Unlike traditional Azure AI services (where you create a resource and immediately get an endpoint), Azure OpenAI separates the resource creation from model deployment—giving you control over which models are available, their capacity, and their version lifecycle.
This challenge covers the three deployment types that appear on the AI-102 exam: Standard (pay-per-token, regional), Global Standard (pay-per-token, global routing), and Provisioned Throughput (reserved capacity, predictable latency). You'll deploy models programmatically, configure capacity in Tokens Per Minute (TPM), manage model versions, and understand the upgrade policies that control automatic version transitions.
Capacity planning is a key exam topic—you need to understand how TPM translates to real-world throughput, how to monitor utilization, and when to choose provisioned throughput over standard deployments.
Architecture
You'll create an Azure OpenAI resource, deploy multiple models with different deployment types and capacities, then validate their availability and compare their behaviors.
Prerequisites
- Azure subscription with Azure OpenAI access approved
- Azure CLI 2.50+ with
cognitiveservicesextension - Python 3.9+ with
pipor .NET 8 SDK azure-identity,azure-mgmt-cognitiveservices,openaiPython packages
Implementation
Task 1: Create an Azure OpenAI Resource and Deploy a Model
- Python SDK
- C# SDK
- REST API
from azure.identity import DefaultAzureCredential
from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient
from azure.mgmt.cognitiveservices.models import (
Account, Sku, AccountProperties, Deployment, DeploymentProperties,
DeploymentModel
)
credential = DefaultAzureCredential()
subscription_id = "YOUR_SUBSCRIPTION_ID"
client = CognitiveServicesManagementClient(credential, subscription_id)
# Create Azure OpenAI resource
account = client.accounts.begin_create(
resource_group_name="rg-ai102-challenge03",
account_name="ai102-openai-03",
account=Account(
sku=Sku(name="S0"),
kind="OpenAI",
location="eastus2",
properties=AccountProperties(
custom_sub_domain_name="ai102-openai-03"
)
)
).result()
print(f"OpenAI resource: {account.properties.endpoint}")
# Deploy GPT-4o with Standard deployment type
deployment = client.deployments.begin_create_or_update(
resource_group_name="rg-ai102-challenge03",
account_name="ai102-openai-03",
deployment_name="gpt-4o-standard",
deployment=Deployment(
sku=Sku(name="Standard", capacity=30), # 30K tokens per minute
properties=DeploymentProperties(
model=DeploymentModel(
format="OpenAI",
name="gpt-4o",
version="2024-08-06"
),
version_upgrade_option="OnceCurrentVersionExpired"
)
)
).result()
print(f"Deployed: {deployment.name}")
print(f"Model: {deployment.properties.model.name} v{deployment.properties.model.version}")
print(f"Capacity: {deployment.sku.capacity}K TPM")
using Azure.Identity;
using Azure.ResourceManager;
using Azure.ResourceManager.CognitiveServices;
using Azure.ResourceManager.CognitiveServices.Models;
var credential = new DefaultAzureCredential();
var client = new ArmClient(credential);
var subscription = await client.GetDefaultSubscriptionAsync();
var resourceGroup = await subscription.GetResourceGroupAsync("rg-ai102-challenge03");
var accounts = resourceGroup.Value.GetCognitiveServicesAccounts();
// Create Azure OpenAI resource
var accountData = new CognitiveServicesAccountData(Azure.Core.AzureLocation.EastUS2)
{
Kind = "OpenAI",
Sku = new CognitiveServicesSku("S0"),
Properties = new CognitiveServicesAccountProperties
{
CustomSubDomainName = "ai102-openai-03"
}
};
var account = await accounts.CreateOrUpdateAsync(
Azure.WaitUntil.Completed, "ai102-openai-03", accountData);
// Deploy GPT-4o
var deployments = account.Value.GetCognitiveServicesAccountDeployments();
var deploymentData = new CognitiveServicesAccountDeploymentData
{
Properties = new CognitiveServicesAccountDeploymentProperties
{
Model = new CognitiveServicesAccountDeploymentModel
{
Format = "OpenAI",
Name = "gpt-4o",
Version = "2024-08-06"
},
VersionUpgradeOption = DeploymentModelVersionUpgradeOption.OnceCurrentVersionExpired
},
Sku = new CognitiveServicesSku("Standard") { Capacity = 30 }
};
var deployment = await deployments.CreateOrUpdateAsync(
Azure.WaitUntil.Completed, "gpt-4o-standard", deploymentData);
Console.WriteLine($"Deployed: {deployment.Value.Data.Name}");
Console.WriteLine($"Model: {deployment.Value.Data.Properties.Model.Name}");
Console.WriteLine($"Capacity: {deployment.Value.Data.Sku.Capacity}K TPM");
# Create resource group
az group create --name rg-ai102-challenge03 --location eastus2
# Create Azure OpenAI resource
az cognitiveservices account create \
--name ai102-openai-03 \
--resource-group rg-ai102-challenge03 \
--kind OpenAI \
--sku S0 \
--location eastus2 \
--custom-domain ai102-openai-03 \
--yes
# Deploy GPT-4o with Standard deployment
az cognitiveservices account deployment create \
--name ai102-openai-03 \
--resource-group rg-ai102-challenge03 \
--deployment-name gpt-4o-standard \
--model-name gpt-4o \
--model-version "2024-08-06" \
--model-format OpenAI \
--sku-name "Standard" \
--sku-capacity 30
# Verify deployment
az cognitiveservices account deployment show \
--name ai102-openai-03 \
--resource-group rg-ai102-challenge03 \
--deployment-name gpt-4o-standard \
-o json
Task 2: Deploy Multiple Models with Different Configurations
- Python SDK
- C# SDK
- REST API
# Deploy GPT-4o-mini for cost-effective workloads
mini_deployment = client.deployments.begin_create_or_update(
resource_group_name="rg-ai102-challenge03",
account_name="ai102-openai-03",
deployment_name="gpt-4o-mini-standard",
deployment=Deployment(
sku=Sku(name="GlobalStandard", capacity=50), # 50K TPM with global routing
properties=DeploymentProperties(
model=DeploymentModel(
format="OpenAI",
name="gpt-4o-mini",
version="2024-07-18"
),
version_upgrade_option="OnceNewDefaultVersionAvailable"
)
)
).result()
print(f"Deployed: {mini_deployment.name} (Global Standard)")
# List all deployments to compare
deployments = client.deployments.list(
resource_group_name="rg-ai102-challenge03",
account_name="ai102-openai-03"
)
print("\n--- All Deployments ---")
for d in deployments:
print(f" {d.name}:")
print(f" Model: {d.properties.model.name} v{d.properties.model.version}")
print(f" Type: {d.sku.name}")
print(f" Capacity: {d.sku.capacity}K TPM")
print(f" Upgrade: {d.properties.version_upgrade_option}")
// Deploy GPT-4o-mini with Global Standard
var miniDeploymentData = new CognitiveServicesAccountDeploymentData
{
Properties = new CognitiveServicesAccountDeploymentProperties
{
Model = new CognitiveServicesAccountDeploymentModel
{
Format = "OpenAI",
Name = "gpt-4o-mini",
Version = "2024-07-18"
},
VersionUpgradeOption = DeploymentModelVersionUpgradeOption.OnceNewDefaultVersionAvailable
},
Sku = new CognitiveServicesSku("GlobalStandard") { Capacity = 50 }
};
var miniDeployment = await deployments.CreateOrUpdateAsync(
Azure.WaitUntil.Completed, "gpt-4o-mini-standard", miniDeploymentData);
Console.WriteLine($"Deployed: {miniDeployment.Value.Data.Name} (Global Standard)");
// List all deployments
Console.WriteLine("\n--- All Deployments ---");
await foreach (var d in deployments.GetAllAsync())
{
Console.WriteLine($" {d.Data.Name}:");
Console.WriteLine($" Model: {d.Data.Properties.Model.Name} v{d.Data.Properties.Model.Version}");
Console.WriteLine($" Type: {d.Data.Sku.Name}");
Console.WriteLine($" Capacity: {d.Data.Sku.Capacity}K TPM");
Console.WriteLine($" Upgrade: {d.Data.Properties.VersionUpgradeOption}");
}
# Deploy GPT-4o-mini with Global Standard
az cognitiveservices account deployment create \
--name ai102-openai-03 \
--resource-group rg-ai102-challenge03 \
--deployment-name gpt-4o-mini-standard \
--model-name gpt-4o-mini \
--model-version "2024-07-18" \
--model-format OpenAI \
--sku-name "GlobalStandard" \
--sku-capacity 50
# List all deployments
az cognitiveservices account deployment list \
--name ai102-openai-03 \
--resource-group rg-ai102-challenge03 \
-o table
# Check available models in the region
az cognitiveservices account list-models \
--name ai102-openai-03 \
--resource-group rg-ai102-challenge03 \
-o table
Task 3: Test Deployment and Monitor Capacity
- Python SDK
- C# SDK
- REST API
import os
from openai import AzureOpenAI
# Test the deployed model
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://ai102-openai-03.openai.azure.com/"
os.environ["AZURE_OPENAI_KEY"] = "YOUR_KEY"
client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_KEY"],
api_version="2024-10-21"
)
# Call the Standard deployment
response = client.chat.completions.create(
model="gpt-4o-standard", # deployment name, not model name
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What deployment types does Azure OpenAI support?"}
],
max_tokens=200
)
print(f"Model: {response.model}")
print(f"Tokens used: {response.usage.total_tokens}")
print(f"Response: {response.choices[0].message.content[:200]}...")
# Check remaining capacity via headers (rate limit info)
print(f"\nDeployment: gpt-4o-standard")
print(f"Configured: 30K TPM")
print(f"Tokens this call: {response.usage.total_tokens}")
using Azure;
using Azure.AI.OpenAI;
using OpenAI.Chat;
var endpoint = new Uri("https://ai102-openai-03.openai.azure.com/");
var key = new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")!);
var openAiClient = new AzureOpenAIClient(endpoint, key);
var chatClient = openAiClient.GetChatClient("gpt-4o-standard");
var response = await chatClient.CompleteChatAsync(
new[]
{
new SystemChatMessage("You are a helpful assistant."),
new UserChatMessage("What deployment types does Azure OpenAI support?")
},
new ChatCompletionOptions { MaxOutputTokenCount = 200 }
);
Console.WriteLine($"Model: {response.Value.Model}");
Console.WriteLine($"Tokens: {response.Value.Usage.TotalTokenCount}");
Console.WriteLine($"Response: {response.Value.Content[0].Text[..200]}...");
ENDPOINT="https://ai102-openai-03.openai.azure.com"
KEY=$(az cognitiveservices account keys list \
--name ai102-openai-03 \
--resource-group rg-ai102-challenge03 \
--query "key1" -o tsv)
# Test Standard deployment
curl -s "${ENDPOINT}/openai/deployments/gpt-4o-standard/chat/completions?api-version=2024-10-21" \
-H "api-key: ${KEY}" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What deployment types does Azure OpenAI support?"}
],
"max_tokens": 200
}' | python -m json.tool
# Check rate limit headers with verbose curl
curl -v "${ENDPOINT}/openai/deployments/gpt-4o-standard/chat/completions?api-version=2024-10-21" \
-H "api-key: ${KEY}" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hi"}],"max_tokens":5}' \
2>&1 | grep -i "x-ratelimit"
Expected Output
OpenAI resource: https://ai102-openai-03.openai.azure.com/
Deployed: gpt-4o-standard
Model: gpt-4o v2024-08-06
Capacity: 30K TPM
Deployed: gpt-4o-mini-standard (Global Standard)
--- All Deployments ---
gpt-4o-standard:
Model: gpt-4o v2024-08-06
Type: Standard
Capacity: 30K TPM
Upgrade: OnceCurrentVersionExpired
gpt-4o-mini-standard:
Model: gpt-4o-mini v2024-07-18
Type: GlobalStandard
Capacity: 50K TPM
Upgrade: OnceNewDefaultVersionAvailable
Model: gpt-4o-2024-08-06
Tokens used: 156
Response: Azure OpenAI supports three deployment types...
Break & fix
| Scenario | Symptom | Root Cause | Fix |
|---|---|---|---|
| Model not available | ModelNotFound error | Model not available in selected region | Check az cognitiveservices account list-models for regional availability |
| Capacity exceeded | InsufficientQuota | Subscription TPM quota fully allocated | Reduce capacity on other deployments or request quota increase |
| Version invalid | InvalidModelVersion | Specified version retired or not yet available | List available versions with the models API |
| 429 Too Many Requests | Rate limiting during inference | Exceeding configured TPM/RPM | Increase deployment capacity or implement retry with exponential backoff |
| Wrong deployment name | DeploymentNotFound in SDK calls | Using model name instead of deployment name | The model parameter in SDK must be the deployment name you chose, not "gpt-4o" |
Knowledge Check
1. What is the key difference between Standard and Global Standard deployments in Azure OpenAI?
2. You set a deployment capacity to 30K TPM. What happens when your application sends requests that would exceed this limit?
3. Which version upgrade option should you choose if you want to control exactly when your model version changes?
4. When making an API call to Azure OpenAI, what value should you pass as the 'model' parameter in the SDK?
5. When should you choose Provisioned Throughput (PTU) over Standard deployment?
Cleanup
az group delete --name rg-ai102-challenge03 --yes --no-wait