Challenge 14: Azure Monitor & alerts

Estimated Time and Cost

60 minutes | Estimated cost: ~$0.10 | Exam Weight: 10–15%

Scenario

Contoso needs observability across their Azure environment. The CTO's mandate is clear: "If you can't monitor it, you can't manage it." Your job is to set up Azure Monitor, create alerts, and prove the team can detect and respond to issues before customers do.

Exam skills covered

Interpret metrics in Azure Monitor
Configure log settings in Azure Monitor
Query and analyze logs in Azure Monitor (KQL)
Set up alert rules, action groups, and alert processing rules
Configure monitoring for VMs, storage, and networks using Azure Monitor Insights
Use Azure Network Watcher and Connection Monitor

Sysadmin ↔ Azure reference

On-Prem / Traditional	Azure Equivalent
Nagios / Zabbix	Azure Monitor
syslog / Event Viewer	Log Analytics workspace
Grafana dashboards	Azure Monitor Workbooks
Email/PagerDuty alerts	Action Groups
Wireshark	Network Watcher packet capture

Setup

# Variables
RG="rg-az104-challenge14"
LOCATION="eastus"

# Create resource group
az group create --name $RG --location $LOCATION

Tasks

Task 1: create a Log Analytics workspace

az monitor log-analytics workspace create \
  --resource-group $RG \
  --workspace-name law-contoso \
  --location $LOCATION

Task 2: deploy a VM and enable VM Insights

Deploy a VM, then enable Azure Monitor VM Insights to collect performance and dependency data.

az vm create \
  --resource-group $RG \
  --name vm-monitored \
  --image Ubuntu2204 \
  --size Standard_B2s \
  --admin-username azureuser \
  --generate-ssh-keys

Enable VM Insights via the Azure Portal: VM → Insights → Enable.

tip

VM Insights automatically installs the Azure Monitor Agent and configures a data collection rule (DCR).

Task 3: explore Azure Monitor metrics

Navigate to Azure Monitor → Metrics (or the VM's Metrics blade) and explore:

Percentage CPU | Current CPU utilization
Available Memory Bytes | Memory pressure
Disk Read/Write Operations/Sec | Disk I/O

Try pinning a chart to a dashboard.

Task 4: configure diagnostic settings

Send platform logs and metrics to Log Analytics:

WORKSPACE_ID=$(az monitor log-analytics workspace show \
  --resource-group $RG \
  --workspace-name law-contoso \
  --query id -o tsv)

VM_ID=$(az vm show \
  --resource-group $RG \
  --name vm-monitored \
  --query id -o tsv)

# Enable diagnostic settings for the VM:
az monitor diagnostic-settings create \
  --name diag-to-law \
  --resource $VM_ID \
  --workspace $WORKSPACE_ID \
  --metrics '[{"category":"AllMetrics","enabled":true}]'

note

It can take 15–30 minutes for log data to appear in Log Analytics after enabling diagnostic settings. This is normal.

Task 5: write KQL queries

Open Log Analytics → Logs and run these queries:

// Top 10 processes by CPU usage
Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| where InstanceName == "_Total"
| summarize AvgCPU = avg(CounterValue) by Computer
| top 10 by AvgCPU desc

// Find error events in the last 24 hours
Syslog
| where SeverityLevel == "err"
| where TimeGenerated > ago(24h)
| project TimeGenerated, Computer, SyslogMessage, Facility
| order by TimeGenerated desc
| take 50

// Heartbeat | which VMs are reporting?
Heartbeat
| summarize LastHeartbeat = max(TimeGenerated) by Computer
| extend Status = iff(LastHeartbeat < ago(5m), "Offline", "Online")
| order by Status asc

More useful KQL patterns

// Count events by severity over time (for charting)
Syslog
| where TimeGenerated > ago(7d)
| summarize Count = count() by bin(TimeGenerated, 1h), SeverityLevel
| render timechart

// Memory usage trend
Perf
| where ObjectName == "Memory" and CounterName == "% Used Memory"
| summarize AvgMem = avg(CounterValue) by bin(TimeGenerated, 15m), Computer
| render timechart

Task 6: create an action Group

Create an action group that sends email notifications:

az monitor action-group create \
  --resource-group $RG \
  --name ag-ops-team \
  --short-name OpsTeam \
  --action email ops-email yourname@contoso.com

Task 7: create a metric alert

Create an alert that fires when CPU exceeds 80% for 5 minutes:

VM_ID=$(az vm show --resource-group $RG --name vm-monitored --query id -o tsv)

az monitor metrics alert create \
  --resource-group $RG \
  --name alert-high-cpu \
  --scopes $VM_ID \
  --condition "avg Percentage CPU > 80" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action $(az monitor action-group show -g $RG -n ag-ops-team --query id -o tsv) \
  --severity 2 \
  --description "CPU usage exceeded 80% for 5 minutes"

Task 8: create a Log alert

Create an alert based on a KQL query — e.g., detect a specific error pattern in logs:

Hint

Use the Azure Portal: Monitor → Alerts → Create → Log alert rule

Scope: Your Log Analytics workspace
Condition: Custom log search with KQL
Query: Syslog | where SeverityLevel == "err" | where TimeGenerated > ago(5m)
Threshold: Greater than 0
Action Group: ag-ops-team

Task 9: enable Storage Insights

Create a storage account (if you don't have one)
Navigate to Azure Monitor → Storage accounts (or Storage account → Insights)
Explore: transaction metrics, latency, availability, capacity trends

Task 10: use Network watcher

Explore these Network Watcher tools:

Topology | Visualize your VNet and connected resources
IP Flow Verify | Test if traffic is allowed or denied between two endpoints
Connection Troubleshoot | Check connectivity from a VM to a destination

# IP flow verify example
az network watcher test-ip-flow \
  --direction Inbound \
  --local 10.0.0.4:80 \
  --protocol TCP \
  --remote 0.0.0.0:* \
  --vm vm-monitored \
  --resource-group $RG

Break & fix

Break it

Alert without action | Create an alert rule but don't attach an action group. Trigger the condition. Notice: the alert fires in the portal but no notification is sent. Why?
Empty log results | Query logs immediately after enabling diagnostic settings. Results are empty. Is this broken?

Fix it

Attach the action group to the alert rule
Understand that log ingestion has a delay (15–30 min) | this is expected behavior, not a bug
Use Heartbeat table to verify the agent is connected

Knowledge check

What is the difference between Metrics and Logs in Azure Monitor?
- Metrics = numeric time-series data, near real-time, stored in a time-series database
- Logs = structured/unstructured text, stored in Log Analytics, queried with KQL
What are the essential KQL operators for the exam?
- where | filter rows
- summarize | aggregate (count, avg, sum, max)
- ago() | time relative to now (e.g., ago(1h), ago(7d))
- project | select columns
- render | visualize results
What are the alert severity levels?
- 0 = Critical, 1 = Error, 2 = Warning, 3 = Informational, 4 = Verbose
What are alert processing rules?
- Rules that modify the behavior of fired alerts (e.g., suppress notifications during maintenance windows, add action groups to all alerts in a scope)

Cleanup

az group delete --name $RG --yes --no-wait

Success criteria

⬜Log Analytics workspace created
⬜VM Insights enabled and collecting data
⬜Azure Monitor metrics explored and understood
⬜Diagnostic settings configured
⬜KQL queries executed successfully
⬜Action group created with email notification
⬜Metric alert created (CPU > 80%)
⬜Log alert created for error patterns
⬜Storage Insights explored
⬜Network Watcher tools used (topology, IP flow verify, connection troubleshoot)
⬜Break & Fix scenarios completed
⬜Resources cleaned up

Scenario​

Exam skills covered​

Sysadmin ↔ Azure reference​

Setup​

Tasks​

Task 1: create a Log Analytics workspace​

Task 2: deploy a VM and enable VM Insights​

Task 3: explore Azure Monitor metrics​

Task 4: configure diagnostic settings​

Task 5: write KQL queries​

Task 6: create an action Group​

Task 7: create a metric alert​

Task 8: create a Log alert​

Task 9: enable Storage Insights​

Task 10: use Network watcher​

Break & fix​

Break it​

Fix it​

Knowledge check​

Cleanup​

Success criteria​