Pular para o conteúdo principal

Desafio 26: Rolling deployments e slot swaps

Habilidades do exame mapeadas

  • Planejar a minimização de tempo de inatividade durante deployments usando balanceamento de carga, rolling deployments e uso e swap de deployment slots

Cenário

A Contoso Ltd executa sua aplicação web voltada ao cliente no Azure App Service com 4 instâncias atrás do balanceador de carga integrado. A aplicação serve 2 milhões de visualizações de página por dia. Durante o último deployment, todas as instâncias foram atualizadas simultaneamente, causando uma interrupção de 3 minutos que resultou em 4.200 requisições com falha e uma avalanche de tickets de suporte ao cliente.

A equipe de engenharia precisa implementar rolling deployments que atualizam instâncias gradualmente mantendo a disponibilidade, e aproveitar deployment slots para releases com zero tempo de inatividade com warm-up adequado e configurações de auto-swap.

Detalhes do ambiente:

  • Azure App Service Plan: Premium V3, 4 instâncias
  • Organização Azure DevOps: contoso-devops
  • Projeto: WebApp
  • Resource group: rg-contoso-webapp-prod
  • Região: West US 2

Tarefa 1: Configurar deployment slots do Azure App Service

Crie uma arquitetura de deployment multi-slot com ambientes de staging e pré-produção.

Provisionar slots

RESOURCE_GROUP="rg-contoso-webapp-prod"
APP_NAME="app-contoso-web"

# Create staging slot
az webapp deployment slot create \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--slot staging

# Create pre-production slot (for integration testing)
az webapp deployment slot create \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--slot preprod

# List all slots
az webapp deployment slot list \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--query "[].name" -o tsv

Configurar roteamento de tráfego de slot para testes

# Route 10% of production traffic to staging for pre-swap validation
az webapp traffic-routing set \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--distribution staging=10

# Verify traffic routing configuration
az webapp traffic-routing show \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP

Tarefa 2: Configurar auto-swap

Habilite o auto-swap para que deployments no slot de staging automaticamente façam swap para produção após a conclusão do warm-up.

# Enable auto-swap on the staging slot
az webapp deployment slot auto-swap \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--slot staging \
--auto-swap-slot production

# Verify auto-swap configuration
az webapp config show \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--slot staging \
--query "autoSwapSlotName"

Como o auto-swap funciona

  1. O código é implantado no slot de staging
  2. O Azure automaticamente aquece o slot de staging enviando requisições ao seu caminho raiz
  3. Após a conclusão do warm-up, o Azure realiza o swap automaticamente
  4. Se o warm-up falhar, o swap não ocorre

Desabilitar auto-swap (quando validação manual é necessária)

az webapp deployment slot auto-swap \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--slot staging \
--auto-swap-slot ""

Tarefa 3: Configurações de app específicas do slot (sticky settings)

Configure definições que permanecem com um slot em vez de se mover com a aplicação durante o swap.

# Production slot settings (slot-sticky)
az webapp config appsettings set \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--slot-settings \
"ENVIRONMENT=production" \
"CACHE_CONNECTION=redis-contoso-prod.redis.cache.windows.net:6380" \
"APPINSIGHTS_INSTRUMENTATIONKEY=<prod-key>"

# Staging slot settings (slot-sticky)
az webapp config appsettings set \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--slot staging \
--slot-settings \
"ENVIRONMENT=staging" \
"CACHE_CONNECTION=redis-contoso-staging.redis.cache.windows.net:6380" \
"APPINSIGHTS_INSTRUMENTATIONKEY=<staging-key>"

# Non-sticky settings (these WILL swap with the app code)
az webapp config appsettings set \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--settings \
"API_VERSION=v2.3.1" \
"FEATURE_NEW_CHECKOUT=true"

Connection strings que permanecem com os slots

# Production connection string (slot-sticky)
az webapp config connection-string set \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--connection-string-type SQLAzure \
--slot-settings \
"DefaultConnection=Server=sql-contoso-prod.database.windows.net;Database=ContosoWeb;Authentication=Active Directory Managed Identity;"

# Staging connection string (slot-sticky)
az webapp config connection-string set \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--slot staging \
--connection-string-type SQLAzure \
--slot-settings \
"DefaultConnection=Server=sql-contoso-staging.database.windows.net;Database=ContosoWeb;Authentication=Active Directory Managed Identity;"

Tarefa 4: Rolling deployment com Azure Pipelines e VMSS

Para os serviços de backend da Contoso executando em Virtual Machine Scale Sets (VMSS), implemente uma estratégia de atualização rolling.

Configurar política de atualização rolling do VMSS

VMSS_NAME="vmss-contoso-backend"

# Configure rolling upgrade policy
az vmss update \
--name $VMSS_NAME \
--resource-group $RESOURCE_GROUP \
--set upgradePolicy.mode=Rolling \
--set upgradePolicy.rollingUpgradePolicy.maxBatchInstancePercent=25 \
--set upgradePolicy.rollingUpgradePolicy.maxUnhealthyInstancePercent=25 \
--set upgradePolicy.rollingUpgradePolicy.maxUnhealthyUpgradedInstancePercent=25 \
--set upgradePolicy.rollingUpgradePolicy.pauseTimeBetweenBatches="PT30S"

YAML do Azure Pipelines para rolling deployment em VMSS

Crie azure-pipelines-vmss-rolling.yml:

trigger:
branches:
include:
- main
paths:
include:
- src/BackendService/**

pool:
vmImage: 'ubuntu-latest'

variables:
resourceGroup: 'rg-contoso-webapp-prod'
vmssName: 'vmss-contoso-backend'
azureSubscription: 'contoso-production-connection'

stages:
- stage: Build
displayName: 'Build application'
jobs:
- job: BuildApp
steps:
- task: UseDotNet@2
inputs:
packageType: 'sdk'
version: '8.0.x'

- script: |
dotnet publish src/BackendService/BackendService.csproj \
--configuration Release \
--output $(Build.ArtifactStagingDirectory)/app
displayName: 'Build and publish'

- task: PublishBuildArtifacts@1
inputs:
PathtoPublish: '$(Build.ArtifactStagingDirectory)/app'
ArtifactName: 'backend-app'

- stage: Deploy
displayName: 'Rolling deployment to VMSS'
dependsOn: Build
jobs:
- deployment: RollingDeploy
environment: 'production'
strategy:
rolling:
maxParallel: 25%
preDeploy:
steps:
- task: AzureCLI@2
displayName: 'Drain instance from load balancer'
inputs:
azureSubscription: $(azureSubscription)
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
echo "Draining instance from load balancer pool..."
sleep 30
deploy:
steps:
- download: current
artifact: backend-app

- task: AzureCLI@2
displayName: 'Deploy to instance'
inputs:
azureSubscription: $(azureSubscription)
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
echo "Deploying new version to instance..."
az vmss extension set \
--vmss-name $(vmssName) \
--resource-group $(resourceGroup) \
--name CustomScript \
--publisher Microsoft.Azure.Extensions \
--version 2.1 \
--settings '{"commandToExecute":"bash /opt/deploy/update-app.sh"}'
routeTraffic:
steps:
- task: AzureCLI@2
displayName: 'Re-enable instance in load balancer'
inputs:
azureSubscription: $(azureSubscription)
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
echo "Adding instance back to load balancer pool..."
postRouteTraffic:
steps:
- task: AzureCLI@2
displayName: 'Verify instance health'
inputs:
azureSubscription: $(azureSubscription)
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
echo "Running health check on updated instance..."
sleep 15
HEALTH_STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health)
if [ "$HEALTH_STATUS" != "200" ]; then
echo "##vso[task.logissue type=error]Health check failed"
exit 1
fi
on:
failure:
steps:
- task: AzureCLI@2
displayName: 'Rollback instance'
inputs:
azureSubscription: $(azureSubscription)
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
echo "Rolling back failed instance..."
az vmss extension set \
--vmss-name $(vmssName) \
--resource-group $(resourceGroup) \
--name CustomScript \
--publisher Microsoft.Azure.Extensions \
--version 2.1 \
--settings '{"commandToExecute":"bash /opt/deploy/rollback-app.sh"}'

Tarefa 5: Health probes durante rolling updates

Configure health probes que o balanceador de carga usa para determinar se uma instância está pronta para receber tráfego.

Extensão de saúde da aplicação para VMSS

# Install the Application Health extension
az vmss extension set \
--vmss-name $VMSS_NAME \
--resource-group $RESOURCE_GROUP \
--name ApplicationHealthLinux \
--publisher Microsoft.ManagedServices \
--version 1.0 \
--settings '{
"protocol": "http",
"port": 8080,
"requestPath": "/health",
"intervalInSeconds": 5,
"numberOfProbes": 3,
"gracePeriod": 600
}'

# Configure automatic instance repair
az vmss update \
--name $VMSS_NAME \
--resource-group $RESOURCE_GROUP \
--set automaticRepairsPolicy.enabled=true \
--set automaticRepairsPolicy.gracePeriod="PT30M"

Configuração de health check do App Service

# Enable health check for the App Service
az webapp config set \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--generic-configurations '{"healthCheckPath":"/health"}'

Tarefa 6: Configuração de warm-up para slots

Configure regras de inicialização da aplicação que garantem que a aplicação está completamente carregada antes de receber tráfego de produção.

Configurações de warm-up do App Service

# Configure slot warm-up path and expected status
az webapp config appsettings set \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--slot staging \
--settings \
"WEBSITE_SWAP_WARMUP_PING_PATH=/health/ready" \
"WEBSITE_SWAP_WARMUP_PING_STATUSES=200" \
"WEBSITE_WARMUP_PATH=/api/warmup"

Módulo de inicialização de aplicação (para App Service Windows)

Para App Service baseado em Windows, configure o web.config:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
<system.webServer>
<applicationInitialization doAppInitAfterRestart="true">
<add initializationPage="/health" hostName="" />
<add initializationPage="/api/products" hostName="" />
<add initializationPage="/api/categories" hostName="" />
</applicationInitialization>
</system.webServer>
</configuration>

Endpoint de warm-up customizado no código da aplicação

[ApiController]
[Route("[controller]")]
public class HealthController : ControllerBase
{
private readonly IDistributedCache _cache;
private readonly IProductRepository _products;

public HealthController(IDistributedCache cache, IProductRepository products)
{
_cache = cache;
_products = products;
}

[HttpGet("ready")]
public async Task<IActionResult> Ready()
{
// Warm up the distributed cache connection
var cacheStatus = await _cache.GetStringAsync("warmup-check");
if (cacheStatus == null)
{
await _cache.SetStringAsync("warmup-check", "initialized",
new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5)
});
}

// Pre-load frequently accessed data
var productCount = await _products.GetCountAsync();
if (productCount == 0)
{
return StatusCode(503, "Data not yet loaded");
}

return Ok(new { status = "ready", products = productCount });
}
}

Tarefa 7: YAML do Azure Pipelines para deployment com slot swap

Crie um workflow completo do Azure Pipelines que faz deploy no staging, valida e faz swap.

Crie azure-pipelines-slot-swap.yml:

trigger:
branches:
include:
- main
paths:
include:
- src/WebApp/**

pool:
vmImage: 'ubuntu-latest'

variables:
azureSubscription: 'contoso-production-connection'
resourceGroup: 'rg-contoso-webapp-prod'
appName: 'app-contoso-web'
dotnetVersion: '8.0.x'

stages:
- stage: Build
displayName: 'Build application'
jobs:
- job: Build
steps:
- task: UseDotNet@2
inputs:
packageType: 'sdk'
version: $(dotnetVersion)

- script: |
dotnet restore src/WebApp/WebApp.csproj
dotnet build src/WebApp/WebApp.csproj --configuration Release --no-restore
dotnet test tests/WebApp.Tests/WebApp.Tests.csproj --configuration Release
dotnet publish src/WebApp/WebApp.csproj --configuration Release --output $(Build.ArtifactStagingDirectory)/webapp
displayName: 'Build, test, and publish'

- task: PublishBuildArtifacts@1
inputs:
PathtoPublish: '$(Build.ArtifactStagingDirectory)/webapp'
ArtifactName: 'webapp'

- stage: DeployStaging
displayName: 'Deploy to staging slot'
dependsOn: Build
jobs:
- deployment: DeployStaging
environment: 'staging'
strategy:
runOnce:
deploy:
steps:
- task: AzureWebApp@1
displayName: 'Deploy to staging slot'
inputs:
azureSubscription: $(azureSubscription)
appType: 'webAppLinux'
appName: $(appName)
deployToSlotOrASE: true
resourceGroupName: $(resourceGroup)
slotName: 'staging'
package: '$(Pipeline.Workspace)/webapp'

- task: AzureCLI@2
displayName: 'Wait for warm-up'
inputs:
azureSubscription: $(azureSubscription)
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
echo "Waiting for staging slot to warm up..."
STAGING_URL="https://$(appName)-staging.azurewebsites.net/health/ready"
MAX_ATTEMPTS=20
ATTEMPT=0

while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$STAGING_URL")
if [ "$HTTP_STATUS" == "200" ]; then
echo "Staging is warm and ready (attempt $ATTEMPT)"
exit 0
fi
echo "Attempt $ATTEMPT: status=$HTTP_STATUS, waiting..."
ATTEMPT=$((ATTEMPT + 1))
sleep 15
done
echo "##vso[task.logissue type=error]Staging warm-up failed after $MAX_ATTEMPTS attempts"
exit 1

- stage: ValidateStaging
displayName: 'Validate staging deployment'
dependsOn: DeployStaging
jobs:
- job: SmokeTests
steps:
- task: AzureCLI@2
displayName: 'Run smoke tests against staging'
inputs:
azureSubscription: $(azureSubscription)
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
STAGING_URL="https://$(appName)-staging.azurewebsites.net"

# Test health endpoint
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$STAGING_URL/health")
[ "$STATUS" == "200" ] || { echo "Health check failed"; exit 1; }

# Test API endpoint
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$STAGING_URL/api/products")
[ "$STATUS" == "200" ] || { echo "Products API failed"; exit 1; }

# Test response time
RESPONSE_TIME=$(curl -s -o /dev/null -w "%{time_total}" "$STAGING_URL/api/products")
echo "Response time: ${RESPONSE_TIME}s"

echo "All smoke tests passed"

- stage: SwapToProduction
displayName: 'Swap staging to production'
dependsOn: ValidateStaging
jobs:
- deployment: SwapSlots
environment: 'production'
strategy:
runOnce:
deploy:
steps:
- task: AzureAppServiceManage@0
displayName: 'Swap staging to production'
inputs:
azureSubscription: $(azureSubscription)
action: 'Swap Slots'
webAppName: $(appName)
resourceGroupName: $(resourceGroup)
sourceSlot: 'staging'

- task: AzureCLI@2
displayName: 'Post-swap production validation'
inputs:
azureSubscription: $(azureSubscription)
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
PROD_URL="https://$(appName).azurewebsites.net/health"
for i in {1..5}; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$PROD_URL")
if [ "$STATUS" == "200" ]; then
echo "Production validation passed"
exit 0
fi
sleep 10
done
echo "##vso[task.logissue type=error]Production validation failed"
exit 1

- stage: Rollback
displayName: 'Rollback on failure'
dependsOn: SwapToProduction
condition: failed()
jobs:
- deployment: RollbackSwap
environment: 'production'
strategy:
runOnce:
deploy:
steps:
- task: AzureAppServiceManage@0
displayName: 'Swap back (rollback)'
inputs:
azureSubscription: $(azureSubscription)
action: 'Swap Slots'
webAppName: $(appName)
resourceGroupName: $(resourceGroup)
sourceSlot: 'staging'

Exercícios de quebra e conserto

Exercício 1: Auto-swap não é acionado

Sintoma: O código é implantado no slot de staging, mas o auto-swap para produção nunca ocorre.

Investigar:

# Check if auto-swap is configured
az webapp config show \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--slot staging \
--query "autoSwapSlotName"

# Check slot activity log for swap errors
az monitor activity-log list \
--resource-group $RESOURCE_GROUP \
--query "[?contains(operationName.value, 'slotsswap')].{time:eventTimestamp, status:status.value}" \
--output table
Mostrar solução

Causa raiz: O App Service plan está no tier Free ou Basic, que não suporta auto-swap.

Correção:

# Upgrade to Standard tier or higher
az appservice plan update \
--name asp-contoso-webapp \
--resource-group $RESOURCE_GROUP \
--sku S1

Exercício 2: Rolling update travado em 25%

Sintoma: O rolling update do VMSS processa 25% das instâncias e então para.

Investigar:

# Check rolling upgrade status
az vmss rolling-upgrade get-latest \
--name $VMSS_NAME \
--resource-group $RESOURCE_GROUP

# Check instance health states
az vmss list-instances \
--name $VMSS_NAME \
--resource-group $RESOURCE_GROUP \
--query "[].{id:instanceId, state:provisioningState}" \
--output table
Mostrar solução

Causa raiz: O primeiro lote de instâncias atualizadas está falhando nos health checks. O threshold maxUnhealthyUpgradedInstancePercent foi atingido, bloqueando atualizações adicionais.

Correção:

# Fix the application issue, then restart the rolling upgrade
az vmss rolling-upgrade start \
--name $VMSS_NAME \
--resource-group $RESOURCE_GROUP

Exercício 3: Slot swap causa cold start

Sintoma: Após o swap de staging para produção, o primeiro lote de requisições experimenta tempos de resposta de 10 a 15 segundos.

Investigar:

# Check if warm-up settings are configured
az webapp config appsettings list \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--slot staging \
--query "[?name=='WEBSITE_SWAP_WARMUP_PING_PATH']"
Mostrar solução

Causa raiz: Nenhum caminho de warm-up está configurado. O Azure realiza o swap sem garantir que a aplicação está completamente inicializada.

Correção:

az webapp config appsettings set \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--slot staging \
--settings \
"WEBSITE_SWAP_WARMUP_PING_PATH=/health/ready" \
"WEBSITE_SWAP_WARMUP_PING_STATUSES=200"

Verificação de conhecimento

1. A Contoso executa 4 instâncias de seu web app em um VMSS e quer realizar um rolling deployment que atualiza uma instância por vez. Qual configuração garante que no máximo 25% das instâncias ficam indisponíveis durante a atualização?

2. Quais tiers do App Service plan suportam deployment slots? (Selecione o tier MÍNIMO necessário)

3. Durante um slot swap, quais das seguintes configurações se movem COM o código da aplicação para o slot de destino por padrão?

4. O slot de staging da Contoso serve requisições com sucesso, mas após o swap para produção, as primeiras 100 requisições falham com HTTP 503. O que deve ser configurado para prevenir isso?

Limpeza

# Delete deployment slots
az webapp deployment slot delete \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--slot staging

az webapp deployment slot delete \
--name $APP_NAME \
--resource-group $RESOURCE_GROUP \
--slot preprod

# Delete the resource group and all resources
az group delete --name rg-contoso-webapp-prod --yes --no-wait