Skip to main content

Challenge 05: CI/CD for AI Solutions

Estimated Time

60 min | Cost: ~$0 (pipeline definition only) | Domain: Plan & Manage AI Solutions (20-25%)

Exam skills covered

  • Integrate Azure AI services into a CI/CD pipeline
  • Automate model deployment with infrastructure as code
  • Implement automated testing for AI endpoints
  • Manage model versioning and deployment strategies
  • Configure environment-specific deployments (dev/staging/prod)

Overview

Production AI solutions require the same CI/CD rigor as any other software system—automated testing, infrastructure as code, environment promotion, and rollback capabilities. The AI-102 exam tests your understanding of how to automate the deployment of Azure AI resources and models through pipelines.

This challenge covers the complete CI/CD lifecycle for Azure AI solutions: defining infrastructure with Bicep templates, deploying Azure OpenAI models through GitHub Actions, implementing smoke tests that validate AI endpoint availability, and managing environment-specific configurations. You'll build a pipeline that follows the pattern: lint → deploy infrastructure → deploy model → smoke test.

Key exam concepts include using service principals for pipeline authentication, managing secrets in GitHub Actions or Azure DevOps, understanding deployment slots and blue-green strategies for AI endpoints, and implementing health checks that verify model availability without consuming excessive tokens.

Architecture

You'll create a complete CI/CD pipeline that provisions Azure AI infrastructure, deploys models, and validates the deployment—all triggered by code changes.

Challenge 05 topology

Prerequisites

  • GitHub repository with Actions enabled
  • Azure subscription with a service principal (Contributor role)
  • Azure CLI 2.50+ installed
  • Familiarity with YAML pipeline syntax and Bicep

Implementation

Task 1: Define Infrastructure as Code with Bicep

# generate_bicep.py - Generate and validate Bicep template programmatically
import subprocess
import json
import os

# Bicep template content for Azure OpenAI with deployment
bicep_template = """
targetScope = 'resourceGroup'

@description('Base name for all resources')
param baseName string

@description('Location for resources')
param location string = resourceGroup().location

@description('OpenAI model name')
param modelName string = 'gpt-4o'

@description('Model version')
param modelVersion string = '2024-08-06'

@description('Deployment capacity in thousands of TPM')
param capacityTPM int = 30

resource openai 'Microsoft.CognitiveServices/accounts@2024-10-01' = {
name: '${baseName}-openai'
location: location
kind: 'OpenAI'
sku: {
name: 'S0'
}
properties: {
customSubDomainName: '${baseName}-openai'
publicNetworkAccess: 'Enabled'
networkAcls: {
defaultAction: 'Allow'
}
}
}

resource deployment 'Microsoft.CognitiveServices/accounts/deployments@2024-10-01' = {
parent: openai
name: '${modelName}-deploy'
sku: {
name: 'Standard'
capacity: capacityTPM
}
properties: {
model: {
format: 'OpenAI'
name: modelName
version: modelVersion
}
versionUpgradeOption: 'OnceCurrentVersionExpired'
}
}

output endpoint string = openai.properties.endpoint
output resourceId string = openai.id
output deploymentName string = deployment.name
"""

# Write Bicep template
os.makedirs("infra", exist_ok=True)
with open("infra/main.bicep", "w") as f:
f.write(bicep_template)

# Validate the template
result = subprocess.run(
["az", "bicep", "build", "--file", "infra/main.bicep"],
capture_output=True, text=True
)

if result.returncode == 0:
print("✓ Bicep template is valid")
else:
print(f"✗ Validation failed: {result.stderr}")

# Run what-if deployment
result = subprocess.run(
["az", "deployment", "group", "what-if",
"--resource-group", "rg-ai102-challenge05",
"--template-file", "infra/main.bicep",
"--parameters", "baseName=ai102-cicd"],
capture_output=True, text=True
)
print(result.stdout)

Task 2: GitHub Actions Workflow for AI Deployment

# generate_workflow.py - Create GitHub Actions workflow programmatically
import os
import yaml

workflow = {
"name": "Deploy Azure AI Solution",
"on": {
"push": {"branches": ["main"]},
"pull_request": {"branches": ["main"]},
"workflow_dispatch": {}
},
"env": {
"AZURE_RESOURCE_GROUP": "rg-ai102-prod",
"BASE_NAME": "ai102-prod",
"LOCATION": "eastus2"
},
"permissions": {
"id-token": "write",
"contents": "read"
},
"jobs": {
"lint": {
"runs-on": "ubuntu-latest",
"steps": [
{"uses": "actions/checkout@v4"},
{
"name": "Lint Bicep",
"uses": "azure/CLI@v2",
"with": {
"inlineScript": "az bicep build --file infra/main.bicep"
}
}
]
},
"deploy-infra": {
"needs": "lint",
"runs-on": "ubuntu-latest",
"if": "github.ref == 'refs/heads/main'",
"steps": [
{"uses": "actions/checkout@v4"},
{
"name": "Azure Login",
"uses": "azure/login@v2",
"with": {
"client-id": "${{ secrets.AZURE_CLIENT_ID }}",
"tenant-id": "${{ secrets.AZURE_TENANT_ID }}",
"subscription-id": "${{ secrets.AZURE_SUBSCRIPTION_ID }}"
}
},
{
"name": "Deploy Infrastructure",
"uses": "azure/arm-deploy@v2",
"with": {
"resourceGroupName": "${{ env.AZURE_RESOURCE_GROUP }}",
"template": "./infra/main.bicep",
"parameters": "baseName=${{ env.BASE_NAME }}",
"failOnStdErr": "false"
},
"id": "deploy"
}
],
"outputs": {
"endpoint": "${{ steps.deploy.outputs.endpoint }}",
"deploymentName": "${{ steps.deploy.outputs.deploymentName }}"
}
},
"smoke-test": {
"needs": "deploy-infra",
"runs-on": "ubuntu-latest",
"steps": [
{"uses": "actions/checkout@v4"},
{
"name": "Azure Login",
"uses": "azure/login@v2",
"with": {
"client-id": "${{ secrets.AZURE_CLIENT_ID }}",
"tenant-id": "${{ secrets.AZURE_TENANT_ID }}",
"subscription-id": "${{ secrets.AZURE_SUBSCRIPTION_ID }}"
}
},
{
"name": "Run Smoke Tests",
"run": "python tests/smoke_test.py",
"env": {
"AZURE_OPENAI_ENDPOINT": "${{ needs.deploy-infra.outputs.endpoint }}",
"DEPLOYMENT_NAME": "${{ needs.deploy-infra.outputs.deploymentName }}"
}
}
]
}
}
}

os.makedirs(".github/workflows", exist_ok=True)
with open(".github/workflows/deploy-ai.yml", "w") as f:
yaml.dump(workflow, f, default_flow_style=False, sort_keys=False)

print("✓ Generated .github/workflows/deploy-ai.yml")

Task 3: Implement Smoke Tests for AI Endpoints

# tests/smoke_test.py - Validate AI endpoint after deployment
import os
import sys
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

def test_endpoint_reachable():
"""Verify the OpenAI endpoint responds."""
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
deployment = os.environ["DEPLOYMENT_NAME"]

# Use managed identity in CI/CD (no keys in pipeline)
credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(
credential, "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
azure_endpoint=endpoint,
azure_ad_token_provider=token_provider,
api_version="2024-10-21"
)

# Minimal token usage smoke test
response = client.chat.completions.create(
model=deployment,
messages=[{"role": "user", "content": "Reply with OK"}],
max_tokens=5
)

assert response.choices[0].message.content is not None
assert response.usage.total_tokens > 0
print(f"✓ Endpoint healthy: {endpoint}")
print(f"✓ Model responded: {response.choices[0].message.content}")
print(f"✓ Tokens used: {response.usage.total_tokens}")

def test_model_version():
"""Verify the expected model version is deployed."""
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
deployment = os.environ["DEPLOYMENT_NAME"]

credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(
credential, "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
azure_endpoint=endpoint,
azure_ad_token_provider=token_provider,
api_version="2024-10-21"
)

response = client.chat.completions.create(
model=deployment,
messages=[{"role": "user", "content": "Hi"}],
max_tokens=1
)

# Verify model identifier matches expected deployment
assert "gpt-4o" in response.model
print(f"✓ Model version verified: {response.model}")

if __name__ == "__main__":
try:
test_endpoint_reachable()
test_model_version()
print("\n✓ All smoke tests passed!")
sys.exit(0)
except Exception as e:
print(f"\n✗ Smoke test failed: {e}")
sys.exit(1)

Expected Output

✓ Bicep template is valid
✓ Generated .github/workflows/deploy-ai.yml

--- Pipeline Execution ---
Job: lint ✓
Job: deploy-infra ✓
Output: endpoint = https://ai102-prod-openai.openai.azure.com/
Output: deploymentName = gpt-4o-deploy
Job: smoke-test ✓
✓ Endpoint healthy: https://ai102-prod-openai.openai.azure.com/
✓ Model responded: OK
✓ Tokens used: 12
✓ Model version verified: gpt-4o-2024-08-06
✓ All smoke tests passed!

Break & fix

ScenarioSymptomRoot CauseFix
Federated identity failsAADSTS70021 in login stepFederated credential not configured for the repo/branchConfigure federated credential with correct subject (repo:org/repo:ref:refs/heads/main)
Deployment race conditionConflict error on model deploymentBicep deploying model before OpenAI resource is readyUse dependsOn in Bicep (implicit via parent property)
Smoke test timeoutTest hangs after deployModel deployment still provisioningAdd wait/retry loop in smoke test with exponential backoff
Secret not availableLogin failed in pipelineGitHub secret name mismatch or not setVerify secret names in repo Settings → Secrets match workflow references
Bicep lint warningPipeline fails on lintUsing deprecated API version in BicepUpdate @2024-10-01 to latest stable API version

Knowledge Check

1. What is the recommended authentication method for GitHub Actions to deploy Azure AI resources?

2. In a CI/CD pipeline deploying Azure OpenAI models, what should the smoke test validate?

3. How should you manage environment-specific configurations (dev/staging/prod) for Azure AI deployments in a pipeline?

4. What Bicep resource property ensures a model deployment waits for its parent Azure OpenAI account to be created first?

5. Your pipeline deploys a new model version but the smoke test fails. What should the pipeline do?

Cleanup

# No Azure resources to clean up (pipeline definitions only)
# If you deployed the infrastructure for testing:
az group delete --name rg-ai102-challenge05 --yes --no-wait

Learn More