Challenge 28: Azure advisor & Service health
45-60 minutes | Estimated cost: ~$0.00 (Advisor and Service Health are free) | Exam Weight: 5-10%
Scenario
Contoso Ltd. wants proactive monitoring with actionable recommendations across security, performance, cost, reliability, and operational excellence. The CTO has requested that the team regularly reviews Azure Advisor recommendations, tracks improvement over time, and sets up alerts for service health events (outages, planned maintenance, and health advisories) so they are never caught off guard.
Exam skills covered
- Review and interpret Azure Advisor recommendations
- Configure Advisor alerts for new recommendations
- Suppress or postpone Advisor recommendations
- Configure Service Health alerts (service issues, planned maintenance, health advisories)
- Check Resource Health for specific resources
- Create action groups for notifications
- Understand Advisor score and improvement tracking
Sysadmin ↔ Azure reference
| On-Prem / Traditional | Azure Equivalent |
|---|---|
| Security audit / penetration test results | Advisor Security recommendations |
| Capacity planning reviews | Advisor Performance recommendations |
| Cost optimization meetings | Advisor Cost recommendations |
| Vendor maintenance notifications | Service Health (planned maintenance) |
| Status page (status.cloud.com) | Azure Service Health / Status |
| Hardware health monitoring (IPMI/iLO) | Resource Health |
| Best practices checklist (CIS benchmarks) | Advisor Operational Excellence |
| Remediation tracking spreadsheet | Advisor Score |
Setup
# Variables
RG="rg-az104-challenge28"
LOCATION="eastus"
# Create resource group (for action groups and alert rules)
az group create --name $RG --location $LOCATION
This challenge primarily uses Azure Advisor and Service Health, which analyze your existing subscription resources. You do not need to deploy VMs or services for this lab | Advisor analyzes whatever already exists in your subscription.
Tasks
Task 1: review Azure advisor recommendations
# List all advisor recommendations for the subscription
az advisor recommendation list -o table
# Filter by category: cost
az advisor recommendation list \
--category Cost -o table
# Filter by category: security
az advisor recommendation list \
--category Security -o table
# Filter by category: performance
az advisor recommendation list \
--category Performance -o table
# Filter by category: reliability (High availability)
az advisor recommendation list \
--category HighAvailability -o table
# Filter by category: operational excellence
az advisor recommendation list \
--category OperationalExcellence -o table
# Get detailed information about a specific recommendation
# az advisor recommendation list --category cost --query "[0]"
Portal Steps:
- Navigate to Azure Advisor in the portal
- Review the dashboard showing recommendations by category
- Click into each category to see detailed recommendations
- Each recommendation shows: Impact (High/Medium/Low), affected resources, and remediation steps
Task 2: understand advisor score
Portal Steps:
- Navigate to Advisor > Advisor Score
- View the overall score (0-100%) and per-category scores
- Each category contributes to the overall score:
- Reliability
- Security
- Performance
- Cost
- Operational Excellence
# Check advisor configuration (what resource groups are included)
az advisor configuration list -o table
# Configure advisor to exclude specific resource groups (if needed)
az advisor configuration update \
--exclude \
--resource-group "rg-dev-sandbox"
Advisor Score represents the percentage of Advisor recommendations that have been addressed. A score of 100% means all recommendations are resolved. Use it to:
- Track improvement over time
- Compare across subscriptions
- Set organizational targets (e.g., maintain above 80%)
Task 3: suppress or postpone recommendations
# List current recommendations
az advisor recommendation list --category Cost -o table
# Suppress (dismiss) a recommendation permanently
# Get the recommendation ID first
RECOMMENDATION_ID=$(az advisor recommendation list \
--category Cost \
--query "[0].id" -o tsv)
# Suppress for a specific resource
if [ -n "$RECOMMENDATION_ID" ]; then
az advisor recommendation disable \
--ids "$RECOMMENDATION_ID" \
--days 30
fi
Portal Steps:
- Navigate to Advisor > Select a recommendation
- Click Dismiss or Postpone
- Choose duration: 1 day, 1 week, 1 month, or forever
- Optionally add a reason (e.g., "Accepted risk for dev environment")
Suppress recommendations when:
- The recommendation does not apply to your scenario (e.g., cost savings for intentionally oversized dev VMs)
- You have compensating controls in place
- The risk is acknowledged and accepted
- The recommendation is a false positive for your workload
Task 4: configure advisor alerts
# First, create an action group for notifications
az monitor action-group create \
--resource-group $RG \
--name ag-advisor-notifications \
--short-name AdvisorAG \
--action email ops-team opsTeam@contoso.com
# Create an advisor alert for new cost recommendations
az advisor recommendation list --category Cost > /dev/null 2>&1
# Create activity log alert for new advisor recommendations
az monitor activity-log alert create \
--resource-group $RG \
--name "alert-advisor-cost" \
--description "Alert when new Cost Advisor recommendations appear" \
--action-group ag-advisor-notifications \
--condition category=Recommendation \
--condition operationName="Microsoft.Advisor/recommendations/available/action"
Portal Steps:
- Navigate to Advisor > Alerts
- Click New alert
- Configure:
- Category: Cost (or All)
- Impact: High, Medium (select as needed)
- Action group: Select or create
- Click Create alert rule
Task 5: configure Service health alerts
# Create alert for service issues (outages) in your region
az monitor activity-log alert create \
--resource-group $RG \
--name "alert-service-issues" \
--description "Alert for Azure service issues affecting our resources" \
--action-group ag-advisor-notifications \
--condition category=ServiceHealth \
--condition "properties.incidentType=Incident"
# Create alert for planned maintenance
az monitor activity-log alert create \
--resource-group $RG \
--name "alert-planned-maintenance" \
--description "Alert for planned maintenance events" \
--action-group ag-advisor-notifications \
--condition category=ServiceHealth \
--condition "properties.incidentType=Maintenance"
# Create alert for health advisories
az monitor activity-log alert create \
--resource-group $RG \
--name "alert-health-advisories" \
--description "Alert for action-required service health events" \
--action-group ag-advisor-notifications \
--condition category=ServiceHealth \
--condition "properties.incidentType=ActionRequired"
# List all activity log alerts
az monitor activity-log alert list \
--resource-group $RG -o table
Portal Steps:
- Navigate to Service Health > Health alerts
- Click Create service health alert
- Configure:
- Subscription: Select your subscription
- Services: Select specific services (or All)
- Regions: Select your regions (e.g., East US)
- Event types: Service issue, Planned maintenance, Health advisory, Security advisory
- Select action group
- Name the alert rule and click Create
Task 6: check Resource health
# Check availability/health status of specific resources
# Resource health is primarily a portal feature, but you can query via REST
# Check VM health via CLI
az vm get-instance-view \
--ids $(az vm list -g $RG --query "[].id" -o tsv 2>/dev/null) \
--query "[].{Name:name, Status:instanceView.statuses[1].displayStatus}" -o table 2>/dev/null
# List resource health events via activity Log
az monitor activity-log list \
--resource-group $RG \
--max-events 20 \
--query "[?category.value=='ResourceHealth'].{Time:eventTimestamp, Resource:resourceId, Status:status.value}" -o table 2>/dev/null
Portal Steps:
-
Navigate to Service Health > Resource Health
-
Filter by subscription, resource type, and resource group
-
Check the health status of each resource:
- Available: Resource is healthy
- Unavailable: Azure detected an issue affecting the resource
- Degraded: Performance issues detected
- Unknown: No health signal received
-
Navigate to a specific resource > Resource Health blade
-
View historical health events and root cause analysis
Task 7: create action Groups for notifications
# Create a comprehensive action group with multiple notification channels
az monitor action-group create \
--resource-group $RG \
--name ag-critical-alerts \
--short-name CritAlert \
--action email cto-email cto@contoso.com \
--action email ops-email ops@contoso.com \
--action sms ops-sms 1 5551234567
# Create an action group with webhook (for integration with ITSM tools)
az monitor action-group create \
--resource-group $RG \
--name ag-webhook-itsm \
--short-name ITSM \
--action webhook servicenow-hook "https://contoso.service-now.com/api/webhook"
# List action groups
az monitor action-group list --resource-group $RG -o table
# Test an action group (sends test notifications)
AG_ID=$(az monitor action-group show \
--resource-group $RG \
--name ag-critical-alerts \
--query "id" -o tsv)
# az monitor action-group test-notifications create \
# --resource-group $rg \
# --action-group-name ag-critical-alerts \
# --alert-type servicehealth \
# --notifications '[{"notificationType":"Email","emailAddress":"ops@contoso.com"}]'
Task 8: review Service health dashboard
Portal Steps:
- Navigate to Service Health in the portal
- Review the four sections:
- Service issues: Current outages affecting your resources
- Planned maintenance: Upcoming maintenance events
- Health advisories: Recommendations and action items
- Security advisories: Security-related notifications
- Click on any active event to see:
- Affected services and regions
- Timeline of updates
- Root cause (after resolution)
- Recommended actions
- Check Health history for past events
Task 9: implement advisor recommendations
# Example: implement a common advisor recommendation
# (Right-size or shut down underutilized VMs)
# List VMs with recommendations
az advisor recommendation list \
--category Cost \
--query "[?contains(shortDescription.problem, 'virtual machine')]" -o table
# Example: resize a VM based on advisor recommendation
# az vm resize --resource-group $rg --name vm-oversize --size Standard_B1s
# Example: enable soft-delete on Key Vault (security recommendation)
# az keyvault update --name my-kv --enable-soft-delete true
# After implementing, refresh advisor to verify
az advisor recommendation list --category Cost -o table
Portal Steps:
- Navigate to Advisor > Select a recommendation
- Click View recommendation details
- Review the affected resources
- Click Remediate (for recommendations with quick-fix)
- Or follow the manual steps provided
Success criteria
- ⬜Advisor recommendations reviewed across all five categories
- ⬜Advisor Score viewed and understood
- ⬜At least one recommendation suppressed/postponed with reason
- ⬜Advisor alert configured for new recommendations
- ⬜Service Health alerts configured for: service issues, planned maintenance, and health advisories
- ⬜Resource Health checked for specific resources
- ⬜Action groups created with email (and optionally SMS/webhook) notifications
- ⬜Service Health dashboard explored (issues, maintenance, advisories)
- ⬜At least one Advisor recommendation implemented or acknowledged
Break & fix scenarios
Scenario a: alert not firing
# Check if action group is correctly configured
az monitor action-group show \
--resource-group $RG \
--name ag-advisor-notifications
# Check if alert rule is enabled
az monitor activity-log alert list \
--resource-group $RG \
--query "[].{Name:name, Enabled:enabled}" -o table
# Common causes:
# 1. action group has invalid email/phone
# 2. alert rule is disabled
# 3. condition scope is too narrow (wrong region/service)
# 4. email is going to spam/junk folder
# Fix: enable the alert rule
az monitor activity-log alert update \
--resource-group $RG \
--name "alert-service-issues" \
--enabled true
Scenario b: too many notifications (Alert fatigue)
# Problem: getting too many low-impact advisor notifications
# Fix 1: suppress low-priority recommendations
az advisor recommendation disable \
--ids "<recommendation-id>" \
--days 90
# Fix 2: create separate action groups for different severities
# High impact -> email + SMS + webhook
# Medium/Low -> email only
# Fix 3: use alert processing rules to suppress during maintenance windows
az monitor alert-processing-rule create \
--resource-group $RG \
--name "suppress-weekends" \
--rule-type RemoveAllActionGroups \
--scopes "/subscriptions/$(az account show --query id -o tsv)/resourceGroups/$RG" \
--schedule-recurrence-type Weekly \
--schedule-recurrence "Saturday" "Sunday"
Scenario c: Resource health shows unavailable
# A VM shows "Unavailable" in Resource health
# Possible causes:
# 1. platform-initiated: Azure host issue (auto-recovery)
# 2. user-initiated: VM deallocated or stopped
# 3. unknown: no health signal
# Check VM status
az vm get-instance-view -g $RG -n vm-affected \
--query "instanceView.statuses[].{Code:code, Status:displayStatus}" -o table 2>/dev/null
# Check activity Log for recent changes
az monitor activity-log list \
--resource-group $RG \
--max-events 10 \
--query "[].{Time:eventTimestamp, Operation:operationName.value, Status:status.value}" \
-o table 2>/dev/null
Knowledge check
1. What are the five Azure Advisor categories?
Show Answer
| Category | Focus Area | Example Recommendations |
|---|---|---|
| Reliability | High availability, disaster recovery | Enable VM backups, configure replication |
| Security | Vulnerabilities and threats | Enable MFA, fix NSG rules, enable encryption |
| Performance | Speed and responsiveness | Right-size VMs, add caching, optimize queries |
| Cost | Reduce spending | Shut down idle VMs, use reserved instances, delete orphaned resources |
| Operational Excellence | Best practices and efficiency | Enable diagnostics, tag resources, use automation |
2. What are the Service Health event types?
Show Answer
| Event Type | Description | Action Required |
|---|---|---|
| Service issues | Active outages affecting your resources | Monitor, failover if possible |
| Planned maintenance | Scheduled maintenance events | Plan for downtime, prepare failover |
| Health advisories | Changes requiring action (deprecations, etc.) | Update configurations before deadline |
| Security advisories | Security-related notifications | Apply patches, update configurations |
Service Health only shows events that affect YOUR resources (not all Azure issues globally).
3. What is the difference between Service Health, Resource Health, and Azure Status?
Show Answer
| Feature | Scope | Personalized | Use Case |
|---|---|---|---|
| Azure Status (status.azure.com) | Global, all customers | No | Check if Azure-wide outage |
| Service Health | Your subscription | Yes | See issues affecting your services/regions |
| Resource Health | Single resource | Yes | Diagnose why a specific resource is unhealthy |
Use Resource Health for individual resource troubleshooting, Service Health for subscription-wide awareness, and Azure Status for global incident information.
4. What types of actions can an action group perform?
Show Answer
| Action Type | Description |
|---|---|
| Send email notification | |
| SMS | Send text message |
| Voice | Automated phone call |
| Push notification | Azure mobile app |
| Webhook | HTTP POST to a URL |
| Logic App | Trigger an Azure Logic App |
| Azure Function | Invoke a function |
| ITSM | Create ticket in ServiceNow, etc. |
| Automation Runbook | Execute a runbook |
| Event Hub | Stream to Event Hub |
| Secure Webhook | Webhook with AAD auth |
Rate limits apply: Email (100/hour), SMS (1/5 min), Voice (1/5 min).
Cleanup
# Delete alert rules and action groups
az monitor activity-log alert delete -g $RG --name "alert-service-issues" 2>/dev/null
az monitor activity-log alert delete -g $RG --name "alert-planned-maintenance" 2>/dev/null
az monitor activity-log alert delete -g $RG --name "alert-health-advisories" 2>/dev/null
az monitor activity-log alert delete -g $RG --name "alert-advisor-cost" 2>/dev/null
# Delete the resource group
az group delete --name $RG --yes --no-wait
echo "Resources are being deleted in the background."