Skip to main content

Challenge 03: Deploy AI Models

Estimated Time

60 min | Cost: ~$1.00 | Domain: Plan & Manage AI Solutions (20-25%)

Exam skills covered

  • Deploy AI models using appropriate deployment options
  • Plan capacity for model deployments (tokens per minute, requests per minute)
  • Manage model versions and lifecycle
  • Choose between Standard, Global Standard, and Provisioned Throughput deployments

Overview

Azure OpenAI Service requires explicit model deployment before you can make inference calls. Unlike traditional Azure AI services (where you create a resource and immediately get an endpoint), Azure OpenAI separates the resource creation from model deployment—giving you control over which models are available, their capacity, and their version lifecycle.

This challenge covers the three deployment types that appear on the AI-102 exam: Standard (pay-per-token, regional), Global Standard (pay-per-token, global routing), and Provisioned Throughput (reserved capacity, predictable latency). You'll deploy models programmatically, configure capacity in Tokens Per Minute (TPM), manage model versions, and understand the upgrade policies that control automatic version transitions.

Capacity planning is a key exam topic—you need to understand how TPM translates to real-world throughput, how to monitor utilization, and when to choose provisioned throughput over standard deployments.

Architecture

You'll create an Azure OpenAI resource, deploy multiple models with different deployment types and capacities, then validate their availability and compare their behaviors.

Challenge 03 topology

Prerequisites

  • Azure subscription with Azure OpenAI access approved
  • Azure CLI 2.50+ with cognitiveservices extension
  • Python 3.9+ with pip or .NET 8 SDK
  • azure-identity, azure-mgmt-cognitiveservices, openai Python packages

Implementation

Task 1: Create an Azure OpenAI Resource and Deploy a Model

from azure.identity import DefaultAzureCredential
from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient
from azure.mgmt.cognitiveservices.models import (
Account, Sku, AccountProperties, Deployment, DeploymentProperties,
DeploymentModel
)

credential = DefaultAzureCredential()
subscription_id = "YOUR_SUBSCRIPTION_ID"
client = CognitiveServicesManagementClient(credential, subscription_id)

# Create Azure OpenAI resource
account = client.accounts.begin_create(
resource_group_name="rg-ai102-challenge03",
account_name="ai102-openai-03",
account=Account(
sku=Sku(name="S0"),
kind="OpenAI",
location="eastus2",
properties=AccountProperties(
custom_sub_domain_name="ai102-openai-03"
)
)
).result()
print(f"OpenAI resource: {account.properties.endpoint}")

# Deploy GPT-4o with Standard deployment type
deployment = client.deployments.begin_create_or_update(
resource_group_name="rg-ai102-challenge03",
account_name="ai102-openai-03",
deployment_name="gpt-4o-standard",
deployment=Deployment(
sku=Sku(name="Standard", capacity=30), # 30K tokens per minute
properties=DeploymentProperties(
model=DeploymentModel(
format="OpenAI",
name="gpt-4o",
version="2024-08-06"
),
version_upgrade_option="OnceCurrentVersionExpired"
)
)
).result()
print(f"Deployed: {deployment.name}")
print(f"Model: {deployment.properties.model.name} v{deployment.properties.model.version}")
print(f"Capacity: {deployment.sku.capacity}K TPM")

Task 2: Deploy Multiple Models with Different Configurations

# Deploy GPT-4o-mini for cost-effective workloads
mini_deployment = client.deployments.begin_create_or_update(
resource_group_name="rg-ai102-challenge03",
account_name="ai102-openai-03",
deployment_name="gpt-4o-mini-standard",
deployment=Deployment(
sku=Sku(name="GlobalStandard", capacity=50), # 50K TPM with global routing
properties=DeploymentProperties(
model=DeploymentModel(
format="OpenAI",
name="gpt-4o-mini",
version="2024-07-18"
),
version_upgrade_option="OnceNewDefaultVersionAvailable"
)
)
).result()
print(f"Deployed: {mini_deployment.name} (Global Standard)")

# List all deployments to compare
deployments = client.deployments.list(
resource_group_name="rg-ai102-challenge03",
account_name="ai102-openai-03"
)

print("\n--- All Deployments ---")
for d in deployments:
print(f" {d.name}:")
print(f" Model: {d.properties.model.name} v{d.properties.model.version}")
print(f" Type: {d.sku.name}")
print(f" Capacity: {d.sku.capacity}K TPM")
print(f" Upgrade: {d.properties.version_upgrade_option}")

Task 3: Test Deployment and Monitor Capacity

import os
from openai import AzureOpenAI

# Test the deployed model
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://ai102-openai-03.openai.azure.com/"
os.environ["AZURE_OPENAI_KEY"] = "YOUR_KEY"

client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_KEY"],
api_version="2024-10-21"
)

# Call the Standard deployment
response = client.chat.completions.create(
model="gpt-4o-standard", # deployment name, not model name
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What deployment types does Azure OpenAI support?"}
],
max_tokens=200
)

print(f"Model: {response.model}")
print(f"Tokens used: {response.usage.total_tokens}")
print(f"Response: {response.choices[0].message.content[:200]}...")

# Check remaining capacity via headers (rate limit info)
print(f"\nDeployment: gpt-4o-standard")
print(f"Configured: 30K TPM")
print(f"Tokens this call: {response.usage.total_tokens}")

Expected Output

OpenAI resource: https://ai102-openai-03.openai.azure.com/
Deployed: gpt-4o-standard
Model: gpt-4o v2024-08-06
Capacity: 30K TPM

Deployed: gpt-4o-mini-standard (Global Standard)

--- All Deployments ---
gpt-4o-standard:
Model: gpt-4o v2024-08-06
Type: Standard
Capacity: 30K TPM
Upgrade: OnceCurrentVersionExpired
gpt-4o-mini-standard:
Model: gpt-4o-mini v2024-07-18
Type: GlobalStandard
Capacity: 50K TPM
Upgrade: OnceNewDefaultVersionAvailable

Model: gpt-4o-2024-08-06
Tokens used: 156
Response: Azure OpenAI supports three deployment types...

Break & fix

ScenarioSymptomRoot CauseFix
Model not availableModelNotFound errorModel not available in selected regionCheck az cognitiveservices account list-models for regional availability
Capacity exceededInsufficientQuotaSubscription TPM quota fully allocatedReduce capacity on other deployments or request quota increase
Version invalidInvalidModelVersionSpecified version retired or not yet availableList available versions with the models API
429 Too Many RequestsRate limiting during inferenceExceeding configured TPM/RPMIncrease deployment capacity or implement retry with exponential backoff
Wrong deployment nameDeploymentNotFound in SDK callsUsing model name instead of deployment nameThe model parameter in SDK must be the deployment name you chose, not "gpt-4o"

Knowledge Check

1. What is the key difference between Standard and Global Standard deployments in Azure OpenAI?

2. You set a deployment capacity to 30K TPM. What happens when your application sends requests that would exceed this limit?

3. Which version upgrade option should you choose if you want to control exactly when your model version changes?

4. When making an API call to Azure OpenAI, what value should you pass as the 'model' parameter in the SDK?

5. When should you choose Provisioned Throughput (PTU) over Standard deployment?

Cleanup

az group delete --name rg-ai102-challenge03 --yes --no-wait

Learn More