Pular para o conteúdo principal

Desafio 47: Coleta de telemetria e insights

Habilidades do exame abordadas

  • Configurar coleta de telemetria usando Application Insights, VM Insights, Container Insights, Azure Monitor for Storage e Azure Monitor for Networks

Cenário

A Contoso Ltd opera uma arquitetura de microsserviços com componentes executando em Azure VMs (serviço legado de pedidos), Azure Kubernetes Service (serviços de pagamento e estoque) e Azure App Service (frontend web). Cada equipe monitora de forma diferente: a equipe de VMs verifica sessões RDP, a equipe de AKS usa logs básicos via kubectl, e a equipe de App Service não usa nenhum monitoramento. O CTO quer observabilidade padronizada em todas as plataformas de computação com rastreamento distribuído para acompanhar requisições de ponta a ponta.

Pré-requisitos

  • Assinatura Azure com acesso de Contributor
  • Um web app no Azure App Service
  • Uma Azure VM (Linux ou Windows)
  • Um cluster AKS com pelo menos uma carga de trabalho implantada
  • Azure CLI instalada
  • Log Analytics workspace

Tarefas

Tarefa 1: Configurar Application Insights para um web app

# Create a Log Analytics workspace
az monitor log-analytics workspace create \
--name law-contoso-observability \
--resource-group rg-contoso-prod \
--location eastus

LAW_ID=$(az monitor log-analytics workspace show \
--name law-contoso-observability \
--resource-group rg-contoso-prod \
--query id -o tsv)

# Create Application Insights (workspace-based)
az monitor app-insights component create \
--app ai-contoso-webapp \
--resource-group rg-contoso-prod \
--location eastus \
--workspace $LAW_ID \
--application-type web

# Get the instrumentation key and connection string
AI_CONNECTION_STRING=$(az monitor app-insights component show \
--app ai-contoso-webapp \
--resource-group rg-contoso-prod \
--query connectionString -o tsv)

# Enable auto-instrumentation on App Service (no code changes needed)
az webapp config appsettings set \
--name app-contoso-web \
--resource-group rg-contoso-prod \
--settings "APPLICATIONINSIGHTS_CONNECTION_STRING=$AI_CONNECTION_STRING" \
"ApplicationInsightsAgent_EXTENSION_VERSION=~3" \
"XDT_MicrosoftApplicationInsights_Mode=Recommended"

# Restart the app to enable auto-instrumentation
az webapp restart --name app-contoso-web --resource-group rg-contoso-prod

Instrumentação baseada em SDK (para mais controle) em uma aplicação .NET:

// Program.cs
using Microsoft.ApplicationInsights.AspNetCore.Extensions;

var builder = WebApplication.CreateBuilder(args);

// Add Application Insights telemetry
builder.Services.AddApplicationInsightsTelemetry(new ApplicationInsightsServiceOptions
{
ConnectionString = builder.Configuration["APPLICATIONINSIGHTS_CONNECTION_STRING"],
EnableAdaptiveSampling = true,
EnableDependencyTrackingTelemetryModule = true,
EnableRequestTrackingTelemetryModule = true
});

var app = builder.Build();

Tarefa 2: Habilitar VM Insights

# Enable VM Insights on an existing VM
az vm extension set \
--name AzureMonitorLinuxAgent \
--publisher Microsoft.Azure.Monitor \
--vm-name vm-contoso-orders \
--resource-group rg-contoso-prod \
--settings "{\"workspaceId\": \"$LAW_ID\"}"

# Create a data collection rule for VM Insights
az monitor data-collection rule create \
--name dcr-vm-insights \
--resource-group rg-contoso-prod \
--location eastus \
--data-flows '[{
"streams": ["Microsoft-InsightsMetrics", "Microsoft-ServiceMap"],
"destinations": ["law-contoso-observability"]
}]' \
--destinations "{\"logAnalytics\": [{\"workspaceResourceId\": \"$LAW_ID\", \"name\": \"law-contoso-observability\"}]}" \
--data-sources "{\"performanceCounters\": [{\"streams\": [\"Microsoft-InsightsMetrics\"], \"samplingFrequencyInSeconds\": 60, \"counterSpecifiers\": [\"\\\\Processor(_Total)\\\\% Processor Time\", \"\\\\Memory\\\\Available Bytes\", \"\\\\LogicalDisk(_Total)\\\\% Free Space\"]}]}"

# Associate the data collection rule with the VM
DCR_ID=$(az monitor data-collection rule show \
--name dcr-vm-insights \
--resource-group rg-contoso-prod \
--query id -o tsv)

az monitor data-collection rule association create \
--name "vm-contoso-orders-association" \
--resource "/subscriptions/<sub-id>/resourceGroups/rg-contoso-prod/providers/Microsoft.Compute/virtualMachines/vm-contoso-orders" \
--rule-id $DCR_ID

# Enable VM Insights via the portal shortcut
# Azure Portal > VM > Monitoring > Insights > Enable
# This automatically installs the agent and creates the DCR

O VM Insights fornece:

  • Aba de desempenho: CPU, memória, IOPS de disco, rede
  • Aba de mapa: dependências de processos e conexões de rede
  • Monitoramento de conexões entre VMs e serviços externos

Tarefa 3: Habilitar Container Insights para AKS

# Enable monitoring add-on on existing AKS cluster
az aks enable-addons \
--name aks-contoso-prod \
--resource-group rg-contoso-prod \
--addons monitoring \
--workspace-resource-id $LAW_ID

# Verify the monitoring agent is running
az aks show \
--name aks-contoso-prod \
--resource-group rg-contoso-prod \
--query "addonProfiles.omsagent.enabled"

# Enable Prometheus metrics collection (managed Prometheus)
az aks update \
--name aks-contoso-prod \
--resource-group rg-contoso-prod \
--enable-azure-monitor-metrics

# Configure Container Insights to collect specific log types
# Create a ConfigMap for agent configuration
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: container-azm-ms-agentconfig
namespace: kube-system
data:
schema-version: v1
config-version: v1
log-data-collection-settings: |
[log_collection_settings]
[log_collection_settings.stdout]
enabled = true
exclude_namespaces = ["kube-system","gatekeeper-system"]
[log_collection_settings.stderr]
enabled = true
exclude_namespaces = ["kube-system"]
[log_collection_settings.env_var]
enabled = false
prometheus-data-collection-settings: |
[prometheus_data_collection_settings.cluster]
interval = "1m"
monitor_kubernetes_pods = true
EOF

Tarefa 4: Configurar métricas e eventos personalizados

# Send custom metrics via Application Insights SDK
# In application code (Node.js example):
// telemetry.js
const appInsights = require('applicationinsights');
appInsights.setup(process.env.APPLICATIONINSIGHTS_CONNECTION_STRING)
.setAutoCollectRequests(true)
.setAutoCollectDependencies(true)
.setAutoCollectExceptions(true)
.start();

const client = appInsights.defaultClient;

// Track custom event
client.trackEvent({
name: "OrderPlaced",
properties: {
customerId: order.customerId,
region: order.region,
paymentMethod: order.paymentMethod
},
measurements: {
orderValue: order.total,
itemCount: order.items.length
}
});

// Track custom metric
client.trackMetric({
name: "OrderProcessingTime",
value: processingDurationMs,
properties: {
serviceVersion: process.env.APP_VERSION
}
});

// Track dependency (external service call)
client.trackDependency({
target: "payment-gateway",
name: "ChargeCard",
data: "POST /api/charge",
duration: callDurationMs,
resultCode: response.status,
success: response.status === 200,
dependencyTypeName: "HTTP"
});

Tarefa 5: Configurar testes de disponibilidade

# Create a standard URL ping test
az monitor app-insights web-test create \
--name "contoso-web-availability" \
--resource-group rg-contoso-prod \
--location "East US" \
--defined-web-test-name "Homepage Health Check" \
--locations Id="us-fl-mia-edge" \
--locations Id="emea-nl-ams-azr" \
--locations Id="apac-sg-sin-azr" \
--kind "ping" \
--frequency 300 \
--timeout 120 \
--web-test "<WebTest Name=\"Homepage\" Id=\"test-001\" Enabled=\"True\" Timeout=\"120\" xmlns=\"http://microsoft.com/schemas/VisualStudio/TeamTest/2010\"><Items><Request Method=\"GET\" Version=\"1.1\" Url=\"https://app-contoso-web.azurewebsites.net/health\" ThinkTime=\"0\" Timeout=\"120\" ParseDependentRequests=\"False\" FollowRedirects=\"True\" RecordResult=\"True\" Cache=\"False\" ResponseTimeGoal=\"0\" Encoding=\"utf-8\" ExpectedHttpStatusCode=\"200\" /></Items></WebTest>"

# Create an alert for availability test failures
az monitor metrics alert create \
--name "alert-availability-failed" \
--resource-group rg-contoso-prod \
--scopes "/subscriptions/<sub-id>/resourceGroups/rg-contoso-prod/providers/microsoft.insights/components/ai-contoso-webapp" \
--condition "avg availabilityResults/availabilityPercentage < 99" \
--window-size 5m \
--evaluation-frequency 1m \
--action "/subscriptions/<sub-id>/resourceGroups/rg-contoso-prod/providers/microsoft.insights/actionGroups/ag-ops-team" \
--description "Availability dropped below 99%"

Tarefa 6: Configurar amostragem para gerenciar custos

# Configure adaptive sampling in Application Insights
# For auto-instrumented App Service, set via app settings:
az webapp config appsettings set \
--name app-contoso-web \
--resource-group rg-contoso-prod \
--settings "MicrosoftAppInsights_AdaptiveSamplingTelemetryProcessor_MaxTelemetryItemsPerSecond=5"

Configuração de amostragem baseada em SDK:

// Program.cs - Configure sampling
builder.Services.AddApplicationInsightsTelemetry();
builder.Services.Configure<TelemetryConfiguration>(config =>
{
var builder = config.DefaultTelemetrySink.TelemetryProcessorChainBuilder;

// Adaptive sampling: target 5 items per second
builder.UseAdaptiveSampling(maxTelemetryItemsPerSecond: 5);

// Fixed-rate sampling: keep 25% of telemetry
// builder.UseSampling(25.0);

// Exclude certain telemetry types from sampling
builder.UseAdaptiveSampling(maxTelemetryItemsPerSecond: 5,
excludedTypes: "Event;Exception");

builder.Build();
});

Amostragem de ingestão (server-side, aplicada a todos os dados independentemente das configurações do SDK):

# Set daily cap to control costs
az monitor app-insights component update \
--app ai-contoso-webapp \
--resource-group rg-contoso-prod \
--ingestion-access Enabled \
--cap 5 # 5 GB daily cap

Tarefa 7: Implementar correlação de rastreamento distribuído

Garantir correlação de rastreamento entre serviços:

# Application Insights automatically correlates requests using W3C Trace Context
# Verify correlation is working by checking the Application Map:
# Azure Portal > Application Insights > Application Map

# For custom HTTP calls, ensure headers are propagated:
# traceparent: 00-<trace-id>-<span-id>-<trace-flags>
# tracestate: (optional vendor-specific state)

Configurar correlação em microsserviços:

# For AKS services, deploy with environment variables for App Insights
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
spec:
template:
spec:
containers:
- name: payment-service
image: contoso.azurecr.io/payment-service:latest
env:
- name: APPLICATIONINSIGHTS_CONNECTION_STRING
valueFrom:
secretKeyRef:
name: app-insights-secret
key: connection-string
- name: OTEL_SERVICE_NAME
value: "payment-service"

Verificar rastreamento de ponta a ponta:

// KQL: Find a request and trace it across services
requests
| where timestamp > ago(1h)
| where name == "POST /api/orders"
| project operation_Id, timestamp, duration, resultCode
| take 1
| join kind=inner (
dependencies
| project operation_Id, target, name, duration, success
) on operation_Id
| project-away operation_Id1

Exercícios de quebra e conserto

Cenário de quebra 1: Container Insights não mostra dados para novo namespace

Um novo microsserviço é implantado em um novo namespace do Kubernetes, mas o Container Insights não mostra logs nem métricas.

Causa: O ConfigMap do Container Insights exclui certos namespaces da coleta de logs, ou o novo namespace foi adicionado à lista de exclusão.

Diagnóstico:

kubectl get configmap container-azm-ms-agentconfig -n kube-system -o yaml
Mostrar solução

Correção: Atualize o ConfigMap para incluir o novo namespace:

kubectl edit configmap container-azm-ms-agentconfig -n kube-system
# Remove the namespace from exclude_namespaces list
# Restart the omsagent pods
kubectl rollout restart daemonset omsagent -n kube-system

Cenário de quebra 2: Rastreamento distribuído mostra lacunas entre serviços

O Application Map mostra todos os serviços, mas a correlação de rastreamento falha entre o frontend e o serviço de pagamento.

Causa: O serviço de pagamento usa um cliente HTTP personalizado que não propaga os cabeçalhos W3C de contexto de rastreamento.

Mostrar solução

Correção: Garanta que a biblioteca do cliente HTTP propague os cabeçalhos traceparent e tracestate. Em Node.js com Application Insights:

// The Application Insights SDK auto-patches common HTTP libraries
// If using a custom client, manually propagate:
const { context, propagation } = require('@opentelemetry/api');

function makeDownstreamCall(url, payload) {
const headers = {};
propagation.inject(context.active(), headers);

return fetch(url, {
method: 'POST',
headers: { ...headers, 'Content-Type': 'application/json' },
body: JSON.stringify(payload)
});
}

Verificação de conhecimento

1. A Contoso tem um web app .NET no App Service. Eles querem telemetria do Application Insights sem modificar o código da aplicação. O que devem configurar?

2. A Contoso executa serviços em VMs, AKS e App Service. Qual solução de monitoramento fornece mapeamento de dependências no nível de processo, mostrando quais processos se comunicam com quais serviços externos na VM?

3. O Application Insights está gerando 50 GB de telemetria diariamente, resultando em custos altos. Qual abordagem reduz custos enquanto preserva visibilidade de erros e exceções?

4. Uma requisição ao web app da Contoso chama três microsserviços backend. No Application Insights, a visualização de transação de ponta a ponta mostra apenas a requisição inicial sem as chamadas downstream. Qual é a causa mais provável?

Limpeza

# Remove Application Insights
az monitor app-insights component delete \
--app ai-contoso-webapp \
--resource-group rg-contoso-prod

# Disable VM Insights agent
az vm extension delete \
--name AzureMonitorLinuxAgent \
--vm-name vm-contoso-orders \
--resource-group rg-contoso-prod

# Disable Container Insights
az aks disable-addons \
--name aks-contoso-prod \
--resource-group rg-contoso-prod \
--addons monitoring

# Delete data collection rules
az monitor data-collection rule delete \
--name dcr-vm-insights \
--resource-group rg-contoso-prod

# Delete Log Analytics workspace
az monitor log-analytics workspace delete \
--name law-contoso-observability \
--resource-group rg-contoso-prod \
--yes