Skip to main content

Challenge 24: Hybrid connectivity troubleshooting

Estimated time and cost

60–90 minutes | ~$0.19/h (VPN Gateway) + ~$1.20/h (ER Circuit) | Exam weight: 20–25%

Scenario

Contoso's network operations team has received three escalation tickets this morning:

  1. Ticket 1 -- S2S VPN flapping: The site-to-site VPN tunnel between headquarters (on-premises) and Azure keeps disconnecting every 5-10 minutes. Users report intermittent connectivity to Azure-hosted applications.

  2. Ticket 2 -- P2S authentication failures: Remote workers using the point-to-site VPN client are receiving authentication errors. Some users get "certificate validation failed" while others see "tunnel type mismatch" errors.

  3. Ticket 3 -- ExpressRoute not provisioned: A newly ordered ExpressRoute circuit shows "Provider Provisioning State: NotProvisioned" even though the service provider claims they have completed their side of the configuration.

The team must use Azure diagnostic tools to systematically identify root causes and resolve each issue.

Exam skills measured

SkillDescription
Diagnose and resolve virtual network gateway connectivity issuesTroubleshoot S2S VPN connections, gateway health, and IKE failures
Diagnose and resolve client-side and authentication issues (P2S)Troubleshoot certificate problems, tunnel types, and address pool issues
Diagnose and resolve ExpressRoute connection issuesVerify circuit state, peering configuration, ARP tables, and route tables

Prerequisites

This challenge assumes the following resources exist (from previous challenges or a lab setup):

  • Resource group with VPN Gateway (VpnGw1 or higher)
  • Active S2S VPN connection
  • P2S VPN configuration with certificate authentication
  • ExpressRoute circuit (can use a mock/test circuit)

Task 1: Troubleshoot S2S VPN -- check connection status

Start by examining the VPN connection object to determine the current state and gather metrics.

Azure CLI

RG="rg-hybrid-challenge24"
GW_NAME="vpngw-contoso"
CONNECTION_NAME="conn-onprem-hq"

# Check VPN connection status and traffic counters
az network vpn-connection show \
--name $CONNECTION_NAME \
--resource-group $RG \
--query "{
connectionStatus: connectionStatus,
ingressBytes: ingressBytesTransferred,
egressBytes: egressBytesTransferred,
connectionType: connectionType,
sharedKey: sharedKey,
provisioningState: provisioningState,
ipsecPolicies: ipsecPolicies
}"

# List all connections on the gateway
az network vpn-connection list \
--resource-group $RG \
--vnet-gateway $GW_NAME \
--output table

Azure PowerShell

$RG = "rg-hybrid-challenge24"
$GwName = "vpngw-contoso"
$ConnName = "conn-onprem-hq"

# Get connection details
$conn = Get-AzVirtualNetworkGatewayConnection `
-ResourceGroupName $RG `
-Name $ConnName

# Display key diagnostic properties
$conn | Select-Object `
Name,
ConnectionStatus,
IngressBytesTransferred,
EgressBytesTransferred,
ConnectionProtocol,
ProvisioningState

# Check if the connection has custom IPsec policies
$conn.IpsecPolicies

Connection status values

StatusMeaning
ConnectedTunnel is up and passing traffic
ConnectingIKE negotiation in progress
NotConnectedTunnel is down, no negotiation active
UnknownGateway cannot determine state (often during updates)

Task 2: Analyze VPN diagnostics with Network Watcher

Use Network Watcher VPN troubleshooting to run automated diagnostics that analyze IKE logs, packet drops, and gateway health.

Azure CLI

STORAGE_ACCOUNT="stdiagcontoso"
CONTAINER_NAME="vpn-diagnostics"
STORAGE_PATH="https://${STORAGE_ACCOUNT}.blob.core.windows.net/${CONTAINER_NAME}"

# Start VPN troubleshooting on the connection
az network watcher troubleshooting start \
--resource $CONNECTION_NAME \
--resource-group $RG \
--resource-type vpnConnection \
--storage-account $STORAGE_ACCOUNT \
--storage-path $STORAGE_PATH

# Alternatively, troubleshoot the gateway itself
az network watcher troubleshooting start \
--resource $GW_NAME \
--resource-group $RG \
--resource-type vnetGateway \
--storage-account $STORAGE_ACCOUNT \
--storage-path $STORAGE_PATH

# Check results of the last troubleshooting operation
az network watcher troubleshooting show \
--resource $GW_NAME \
--resource-group $RG \
--resource-type vnetGateway

Azure PowerShell

$StorageAccount = Get-AzStorageAccount -ResourceGroupName $RG -Name "stdiagcontoso"

# Start gateway troubleshooting
$gw = Get-AzVirtualNetworkGateway -ResourceGroupName $RG -Name $GwName

Start-AzNetworkWatcherResourceTroubleshooting `
-NetworkWatcher (Get-AzNetworkWatcher -ResourceGroupName "NetworkWatcherRG" -Name "NetworkWatcher_eastus") `
-TargetResourceId $gw.Id `
-StorageId $StorageAccount.Id `
-StoragePath "https://stdiagcontoso.blob.core.windows.net/vpn-diagnostics"
![Challenge 24 - Network Topology](/img/az-700/challenge-24-topology.svg)


#### Azure PowerShell

```powershell
$gw = Get-AzVirtualNetworkGateway -ResourceGroupName $RG -Name $GwName

# Inspect P2S configuration
$gw.VpnClientConfiguration | Select-Object `
VpnClientProtocols,
VpnClientAddressPool,
VpnAuthenticationTypes

# List root certificates
$gw.VpnClientConfiguration.VpnClientRootCertificates |
Select-Object Name, ProvisioningState

Common P2S issues and resolution

Issue 1: Certificate validation failed

The client certificate was not issued by a root CA that is uploaded to the gateway.

# List uploaded root certificates
az network vnet-gateway root-cert list \
--gateway-name $GW_NAME \
--resource-group $RG \
--output table

# Upload a missing root certificate (base64-encoded .cer without header/footer)
az network vnet-gateway root-cert create \
--gateway-name $GW_NAME \
--resource-group $RG \
--name "ContosoRootCA" \
--public-cert-data "MIIDuzCCAqO..."

Issue 2: Tunnel type mismatch

The client is configured for IKEv2 but the gateway only supports SSTP, or vice versa.

# Update gateway to support both IKEv2 and OpenVPN
az network vnet-gateway update \
--name $GW_NAME \
--resource-group $RG \
--client-protocol IkeV2 OpenVPN

Issue 3: Address pool exhaustion

All P2S client IPs are allocated. No new clients can connect.

# Check current address pool size
az network vnet-gateway show \
--name $GW_NAME \
--resource-group $RG \
--query "vpnClientConfiguration.vpnClientAddressPool.addressPrefixes"

# Expand the address pool
az network vnet-gateway update \
--name $GW_NAME \
--resource-group $RG \
--address-prefixes "172.16.0.0/16"
Address pool sizing

A /24 prefix provides approximately 251 usable client addresses. For larger deployments, use /16 or multiple prefixes. The address pool must not overlap with any VNet address space or on-premises ranges.


Task 4: Troubleshoot ExpressRoute circuit and peering

Examine the ExpressRoute circuit provisioning state, peering configuration, and verify layer 2/3 connectivity.

Azure CLI

ER_NAME="er-contoso-equinix"

# Check circuit provisioning state
az network express-route show \
--name $ER_NAME \
--resource-group $RG \
--query "{
circuitProvisioningState: circuitProvisioningState,
serviceProviderProvisioningState: serviceProviderProvisioningState,
serviceProviderProperties: serviceProviderProperties,
sku: sku,
bandwidthInMbps: bandwidthInMbps
}"

# Check peering configuration
az network express-route peering show \
--circuit-name $ER_NAME \
--resource-group $RG \
--name "AzurePrivatePeering" \
--query "{
peeringType: peeringType,
state: state,
azureASN: azureASN,
peerASN: peerASN,
primaryPeerAddressPrefix: primaryPeerAddressPrefix,
secondaryPeerAddressPrefix: secondaryPeerAddressPrefix,
vlanId: vlanId
}"

# Get circuit statistics (bytes in/out)
az network express-route get-stats \
--name $ER_NAME \
--resource-group $RG

# Get ARP table to verify layer 2 connectivity
az network express-route list-arp-tables \
--name $ER_NAME \
--resource-group $RG \
--peering-name "AzurePrivatePeering" \
--device-path "primary"

# Get route table to verify BGP route exchange
az network express-route list-route-tables \
--name $ER_NAME \
--resource-group $RG \
--peering-name "AzurePrivatePeering" \
--device-path "primary"

Azure PowerShell

$ErName = "er-contoso-equinix"

# Get circuit details
$circuit = Get-AzExpressRouteCircuit -ResourceGroupName $RG -Name $ErName

# Check provisioning states
$circuit | Select-Object `
CircuitProvisioningState,
ServiceProviderProvisioningState,
@{N='Bandwidth';E={$_.ServiceProviderProperties.BandwidthInMbps}}

# Get peering details
$peering = Get-AzExpressRouteCircuitPeeringConfig `
-ExpressRouteCircuit $circuit `
-Name "AzurePrivatePeering"

$peering | Select-Object `
PeeringType,
State,
AzureASN,
PeerASN,
PrimaryPeerAddressPrefix,
SecondaryPeerAddressPrefix,
VlanId

# Get ARP table
Get-AzExpressRouteCircuitARPTable `
-ResourceGroupName $RG `
-ExpressRouteCircuitName $ErName `
-PeeringType "AzurePrivatePeering" `
-DevicePath "Primary"

# Get route table
Get-AzExpressRouteCircuitRouteTable `
-ResourceGroupName $RG `
-ExpressRouteCircuitName $ErName `
-PeeringType "AzurePrivatePeering" `
-DevicePath "Primary"

ExpressRoute state matrix

Circuit Provisioning StateService Provider StateMeaning
EnabledNotProvisionedCircuit created in Azure; waiting for provider
EnabledProvisioningProvider is configuring their side
EnabledProvisionedProvider done; ready for peering config
DeprovisioningDeprovisioningCircuit being deleted

Task 5: Use VPN gateway reset as last resort

When a gateway becomes unresponsive or tunnels are stuck in a bad state, resetting the gateway restarts the active instance and forces IKE renegotiation.

Azure CLI

# Reset the VPN gateway (affects all connections on this gateway)
az network vnet-gateway reset \
--name $GW_NAME \
--resource-group $RG

# Wait for the gateway to come back online
az network vnet-gateway wait \
--name $GW_NAME \
--resource-group $RG \
--created

# Verify gateway status after reset
az network vnet-gateway show \
--name $GW_NAME \
--resource-group $RG \
--query "{provisioningState:provisioningState, gatewayType:gatewayType, vpnType:vpnType}"

Azure PowerShell

# Reset the gateway
Reset-AzVirtualNetworkGateway `
-VirtualNetworkGateway (Get-AzVirtualNetworkGateway -ResourceGroupName $RG -Name $GwName)

# Check gateway health after reset
Get-AzVirtualNetworkGateway -ResourceGroupName $RG -Name $GwName |
Select-Object Name, ProvisioningState, GatewayType, VpnType
Gateway reset impact

Resetting a gateway:

  • Disrupts ALL connections on that gateway (S2S, P2S, and VNet-to-VNet)
  • Takes 5-15 minutes to complete
  • Does not change the gateway configuration -- only restarts the active instance
  • For active-active gateways, you can reset each instance separately using the --gateway-vip parameter

Task 6: Advanced troubleshooting with packet capture

For persistent issues, capture packets on the VPN gateway to analyze the IKE handshake and data plane traffic.

Azure CLI

# Start packet capture on the gateway (captures IKE and ESP traffic)
az network vnet-gateway packet-capture start \
--name $GW_NAME \
--resource-group $RG

# After reproducing the issue, stop and save the capture
# The SAS URL points to a blob where the capture is stored
az network vnet-gateway packet-capture stop \
--name $GW_NAME \
--resource-group $RG \
--sas-url "https://stdiagcontoso.blob.core.windows.net/captures?sv=2023-01-01&st=..."

Azure PowerShell

$gw = Get-AzVirtualNetworkGateway -ResourceGroupName $RG -Name $GwName

# Start capture
Start-AzVirtualNetworkGatewayPacketCapture `
-ResourceGroupName $RG `
-Name $GwName

# Stop capture and download
Stop-AzVirtualNetworkGatewayPacketCapture `
-ResourceGroupName $RG `
-Name $GwName `
-SasUrl "https://stdiagcontoso.blob.core.windows.net/captures?sv=2023-01-01&st=..."

Break & fix

Scenario 1: VPN connection flapping (DPD timeout)

Symptom: The S2S tunnel disconnects every 5-10 minutes, reconnects automatically, then drops again. Bytes transferred counters reset each time.

Root cause: Dead Peer Detection (DPD) timeout is set too aggressively on the on-premises device. Azure uses a DPD timeout of 45 seconds by default. If the on-premises device has a lower timeout (e.g., 10 seconds) and there are brief latency spikes, it tears down the tunnel.

Diagnosis:

# Check the connection for custom IPsec/IKE policies
az network vpn-connection show \
--name $CONNECTION_NAME \
--resource-group $RG \
--query "ipsecPolicies"

# Look for DPD-related failures in troubleshooting output
az network watcher troubleshooting start \
--resource $CONNECTION_NAME \
--resource-group $RG \
--resource-type vpnConnection \
--storage-account $STORAGE_ACCOUNT \
--storage-path $STORAGE_PATH

Fix: Set a custom IPsec policy with appropriate DPD timeout (Azure minimum is 9 seconds, recommended is 45 seconds). Also ensure the on-premises device matches:

az network vpn-connection ipsec-policy add \
--connection-name $CONNECTION_NAME \
--resource-group $RG \
--ike-encryption AES256 \
--ike-integrity SHA256 \
--dh-group DHGroup14 \
--ipsec-encryption AES256 \
--ipsec-integrity SHA256 \
--pfs-group PFS14 \
--sa-lifetime 28800 \
--sa-data-size 102400000

Scenario 2: P2S address pool full

Symptom: New P2S VPN clients receive an error "no available IP addresses" or fail to connect while existing clients remain connected.

Root cause: The P2S address pool was configured with a /28 (14 usable IPs) and all addresses are allocated to existing sessions.

Diagnosis:

# Check current pool size
az network vnet-gateway show \
--name $GW_NAME \
--resource-group $RG \
--query "vpnClientConfiguration.vpnClientAddressPool"

# Count connected clients (approximate)
az network vnet-gateway vpn-client show-health \
--name $GW_NAME \
--resource-group $RG 2>/dev/null || echo "Use Azure Portal > VPN Gateway > Point-to-site configuration > Connected clients"

Fix: Expand the address pool to accommodate more clients:

az network vnet-gateway update \
--name $GW_NAME \
--resource-group $RG \
--address-prefixes "172.16.0.0/16"
note

Changing the address pool requires existing P2S clients to reconnect. Plan this change during a maintenance window.


Scenario 3: ExpressRoute ARP failure (wrong VLAN)

Symptom: ExpressRoute peering state shows "Enabled" but the ARP table returns empty results. No routes are learned.

Root cause: The VLAN ID configured in the Azure peering does not match the VLAN ID configured by the service provider on their edge router.

Diagnosis:

# Check the VLAN ID in peering configuration
az network express-route peering show \
--circuit-name $ER_NAME \
--resource-group $RG \
--name "AzurePrivatePeering" \
--query "vlanId"

# Verify ARP table is empty (no layer 2 adjacency)
az network express-route list-arp-tables \
--name $ER_NAME \
--resource-group $RG \
--peering-name "AzurePrivatePeering" \
--device-path "primary"

Fix: Coordinate with the service provider to confirm the correct VLAN ID, then update the peering:

# Update peering with correct VLAN ID (example: provider confirms VLAN 200)
az network express-route peering update \
--circuit-name $ER_NAME \
--resource-group $RG \
--name "AzurePrivatePeering" \
--vlan-id 200

After updating, verify ARP resolves within 1-2 minutes and BGP routes begin appearing in the route table.


Troubleshooting decision tree

VPN Tunnel Down?NotConnectedCheck on-prem device reachability (UDP 500/4500)ConnectingIKE negotiation failing:• Check shared key match• Check IKE/IPsec policy alignment• Run Network Watcher troubleshootingConnected, no trafficCheck routing (UDR, BGP, NSG)P2S VPN Failing?Certificate errorVerify root cert uploaded, client cert not revokedTunnel type errorMatch client protocol to gateway config (IKEv2/OpenVPN/SSTP)No IPs availableExpand address poolExpressRoute Not Working?Provider State = NotProvisionedContact providerPeering State = DisabledCheck peering configurationARP table emptyVLAN mismatch or L2 issue with providerRoutes missingBGP ASN mismatch or prefix filtering

Cleanup

# Delete resources if they were created for this challenge
az group delete --name $RG --yes --no-wait
Remove-AzResourceGroup -Name "rg-hybrid-challenge24" -Force -AsJob

Knowledge check

1. A VPN connection shows connectionStatus 'Connecting' for over 10 minutes. The on-premises device log shows 'no proposal chosen'. What is the most likely cause?

2. Which command starts automated VPN troubleshooting that analyzes IKE logs and produces a diagnostic report?

3. An ExpressRoute circuit shows circuitProvisioningState 'Enabled' and serviceProviderProvisioningState 'NotProvisioned'. What does this indicate?

4. P2S VPN clients fail to connect with 'certificate validation failed'. The root CA certificate was recently renewed. What action resolves this?

5. After resetting a VPN gateway, what is the expected impact?

6. An ExpressRoute peering shows state 'Enabled' but the ARP table returns empty results on both primary and secondary paths. What is the most likely layer 2 issue?