Challenge 20: design Data Storage for cost and performance
60-90 min | Estimated cost: $2-5 | Exam Weight: 20-25%
Introduction
DataForge Analytics is a fast-growing AI startup that has outgrown its initial storage architecture. Today they manage 100TB of data across Azure, with projections reaching 500TB within 12 months. Their data falls into three distinct usage patterns: ML training datasets accessed hourly by GPU clusters (hot data), user-uploaded files accessed daily through their SaaS platform (warm data), and compliance archives that must be retained for 7 years but are rarely accessed (cold data).
The CFO has raised an urgent concern: the current monthly storage bill is $15,000 and growing linearly with data volume. The target is to reduce costs below $10,000/month without sacrificing read performance on hot data that feeds the ML pipeline. The ML team reports that any latency increase on training data reads directly impacts model training time and GPU utilization efficiency.
Your task is to design a tiered storage strategy that balances cost optimization with performance requirements, leveraging Azure Storage access tiers, reserved capacity pricing, lifecycle management policies, and caching layers where appropriate.
Exam skills covered
- Recommend a data storage solution to balance features, performance, and costs
Design tasks
Part 1: analyze current Storage and define tier strategy
- Create a resource group for this challenge and deploy a Standard general-purpose v2 storage account.
- Document the current pricing for each access tier (Hot, Cool, Cold, Archive) including per-GB storage costs, read/write operation costs, and data retrieval costs in your chosen region.
- Design a tiering strategy that maps each data category to the appropriate access tier:
- ML training datasets (10TB, accessed hourly) - evaluate Hot tier vs Premium block blob storage
- User uploads (30TB, accessed 1-5 times daily) - evaluate Cool vs Hot tier
- Compliance archives (60TB, accessed less than once per year) - evaluate Cold vs Archive tier
- Calculate the projected monthly cost for your proposed tier allocation versus keeping everything in Hot tier.
Part 2: implement lifecycle Management Policies
- Create a lifecycle management policy that automatically transitions blobs between tiers based on last access time:
- Move blobs not accessed for 30 days from Hot to Cool
- Move blobs not accessed for 90 days from Cool to Cold
- Move blobs not accessed for 180 days from Cold to Archive
- Enable last access time tracking on the storage account to support access-time-based policies.
- Create a second policy rule that deletes temporary processing blobs (prefix:
temp/) after 7 days.
Part 3: evaluate reserved capacity and caching
- Calculate the savings from purchasing 100TB of Azure Storage reserved capacity (1-year commitment) versus pay-as-you-go pricing for the stable baseline storage.
- Design a caching strategy for the ML training data using Azure Cache for Redis or Azure HPC Cache. Document:
- Which caching solution is appropriate for large dataset reads
- Expected cache hit ratio for repeatedly-accessed training datasets
- Cost of the caching layer versus the performance benefit
- Create a decision matrix comparing Standard vs Premium storage account performance tiers for the ML workload, considering IOPS, throughput, and latency requirements.
Part 4: design for growth
- Document how your design scales from 100TB to 500TB while maintaining the $10K/month budget constraint.
- Design a monitoring solution using Azure Monitor metrics to track:
- Storage capacity growth per container
- Access patterns per tier (to validate lifecycle policy effectiveness)
- Cost alerts when monthly spend approaches budget threshold
Success criteria
- ⬜Lifecycle management policy deployed with at least 3 tier transition rules based on last access time
- ⬜Cost analysis document shows projected savings of 30% or more compared to all-Hot storage
- ⬜Decision matrix compares Standard vs Premium tiers with IOPS, throughput, latency, and cost columns
- ⬜Reserved capacity calculation demonstrates break-even point for 1-year commitment
- ⬜Caching strategy documented with solution selection rationale and cost-benefit analysis
- ⬜Growth plan shows cost remains under $10K/month at 500TB scale
Hints
Hint 1: Understanding Access Tier Pricing
Azure Blob Storage access tiers have an inverse relationship between storage cost and access cost. Hot tier has higher per-GB storage cost but lower read/write operation costs. Archive tier has the lowest storage cost (roughly 1/20th of Hot) but high retrieval costs and 15-hour rehydration latency. The Cold tier (introduced after Cool) offers pricing between Cool and Archive with lower retrieval costs than Archive.
Hint 2: Lifecycle Management Policy Structure
Lifecycle management policies use JSON rules with baseBlob actions. Enable enableAutoTierToHotFromCool if you want Azure to automatically move blobs back to Hot when accessed. Use daysAfterLastAccessTimeGreaterThan (requires access tracking enabled) rather than daysAfterModificationGreaterThan for access-pattern-based tiering.
Hint 3: Reserved Capacity Considerations
Azure Storage reserved capacity provides up to 38% discount for 1-year and up to 56% for 3-year commitments on block blob storage capacity. The reservation applies to the total storage amount regardless of tier. It does not cover transaction costs, data transfer, or operations - only the per-GB capacity charge.
Hint 4: Caching for Large Datasets
For ML training workloads reading large datasets (multi-TB), Azure HPC Cache is designed for high-throughput file-based workloads and can cache data from Azure Blob Storage. Azure Cache for Redis is better suited for smaller, key-value lookups. Consider whether the ML framework supports file-based reads (HPC Cache) or object-based reads (Redis).
Hint 5: Premium Block Blob Storage
Premium block blob storage accounts use SSDs and are optimized for workloads requiring consistent low latency and high transaction rates. They only support the Hot tier (no lifecycle tiering) and cost significantly more per GB. They are best when you need sub-millisecond latency, not just high throughput.
Learning resources
- Azure Blob Storage access tiers
- Optimize costs with Azure Storage reserved capacity
- Azure Blob Storage lifecycle management
- Plan and manage costs for Azure Blob Storage
- Premium block blob storage accounts
- Azure HPC Cache overview
Knowledge check
1. A company stores 50TB of log data that is written once and read approximately twice per month for compliance audits. Which access tier minimizes total cost (storage + operations)?
Cool tier. While Archive has the lowest per-GB storage cost, the twice-monthly read pattern would incur significant retrieval costs and 15-hour rehydration delays. Cold tier might also work, but Cool provides a good balance between storage cost savings (roughly 50% less than Hot) and reasonable operation costs for occasional reads. The key insight is that Archive is only cost-effective when data is accessed less than once or twice per year.
2. When does Azure Storage reserved capacity NOT provide cost savings?
When storage volume is highly variable or shrinking. Reserved capacity requires a commitment to a fixed amount of storage (100TB or 1PB increments). If actual usage falls below the reserved amount, you pay for unused capacity. It also does not cover transaction costs, egress, or operations - only the per-GB capacity charge. If your workload is transaction-heavy but storage-light, reserved capacity provides minimal benefit.
3. A lifecycle management policy moves blobs to Archive after 180 days. A user needs to read an archived blob immediately. What happens?
The read fails until the blob is rehydrated. Archived blobs are offline and cannot be read directly. The user must first rehydrate the blob by changing its tier to Hot, Cool, or Cold (standard priority takes up to 15 hours; high priority may complete in under 1 hour for blobs under 10GB). Alternatively, they can copy the blob to a new blob in an online tier. This is a critical design consideration - if any compliance data might need urgent access, Archive tier may not be appropriate without a documented rehydration process.
4. What is the primary difference between a Standard general-purpose v2 storage account and a Premium block blob storage account for read-heavy workloads?
Latency consistency and IOPS. Premium block blob storage uses SSDs and provides consistent single-digit millisecond latency and higher IOPS. Standard accounts use HDDs with variable latency (typically 5-10ms but can spike). Premium is priced per GB (no access tiers) and costs 2-3x more per GB than Standard Hot tier. The design decision depends on whether the workload requires consistent low latency (Premium) or can tolerate variable latency in exchange for tier-based cost optimization (Standard).
Validation lab
This lab demonstrates that Azure Storage access tiers are not just pricing categories -- they produce fundamentally different behavior. Archive tier is genuinely offline (reads fail), last-access-time tracking enables intelligent automation, and lifecycle policies operate without application changes. You will observe these behaviors directly.
Step 1: create a storage account with last-access-time tracking
az group create \
--name rg-az305-challenge20 \
--location eastus
STORAGE_ACCOUNT="stch20${RANDOM}"
az storage account create \
--name "$STORAGE_ACCOUNT" \
--resource-group rg-az305-challenge20 \
--sku Standard_LRS \
--kind StorageV2 \
--access-tier Hot
Enable last-access-time tracking (required for access-time-based lifecycle policies):
az storage account blob-service-properties update \
--account-name "$STORAGE_ACCOUNT" \
--resource-group rg-az305-challenge20 \
--enable-last-access-tracking true
STORAGE_KEY=$(az storage account keys list \
--account-name "$STORAGE_ACCOUNT" \
--resource-group rg-az305-challenge20 \
--query "[0].value" -o tsv)
Create a container for the experiment:
az storage container create \
--name tiering-lab \
--account-name "$STORAGE_ACCOUNT" \
--account-key "$STORAGE_KEY"
Step 2: upload the same file to hot, cool, and archive tiers
Create a sample file:
echo "This is sample ML training data for tier comparison testing." > sample-data.txt
Upload to Hot tier:
az storage blob upload \
--container-name tiering-lab \
--name "hot/sample-data.txt" \
--file sample-data.txt \
--account-name "$STORAGE_ACCOUNT" \
--account-key "$STORAGE_KEY" \
--tier Hot
Upload to Cool tier:
az storage blob upload \
--container-name tiering-lab \
--name "cool/sample-data.txt" \
--file sample-data.txt \
--account-name "$STORAGE_ACCOUNT" \
--account-key "$STORAGE_KEY" \
--tier Cool
Upload to Archive tier:
az storage blob upload \
--container-name tiering-lab \
--name "archive/sample-data.txt" \
--file sample-data.txt \
--account-name "$STORAGE_ACCOUNT" \
--account-key "$STORAGE_KEY" \
--tier Archive
Step 3: compare blob properties across tiers
echo "=== Hot Tier Blob ==="
az storage blob show \
--container-name tiering-lab \
--name "hot/sample-data.txt" \
--account-name "$STORAGE_ACCOUNT" \
--account-key "$STORAGE_KEY" \
--query "{tier:properties.blobTier, lastAccessed:properties.lastAccessedOn, contentLength:properties.contentLength}" \
-o table
echo "=== Cool Tier Blob ==="
az storage blob show \
--container-name tiering-lab \
--name "cool/sample-data.txt" \
--account-name "$STORAGE_ACCOUNT" \
--account-key "$STORAGE_KEY" \
--query "{tier:properties.blobTier, lastAccessed:properties.lastAccessedOn, contentLength:properties.contentLength}" \
-o table
echo "=== Archive Tier Blob ==="
az storage blob show \
--container-name tiering-lab \
--name "archive/sample-data.txt" \
--account-name "$STORAGE_ACCOUNT" \
--account-key "$STORAGE_KEY" \
--query "{tier:properties.blobTier, lastAccessed:properties.lastAccessedOn, contentLength:properties.contentLength}" \
-o table
All three blobs have the same content and size, but the tier metadata differs. The storage cost per GB varies dramatically: Hot is roughly 20x more expensive per GB than Archive. However, the trade-off is access behavior -- as you will see in the next step, Archive tier is not just "cheaper storage" but fundamentally offline storage.
Step 4: attempt to download from archive tier -- observe failure
Download from Hot tier (succeeds instantly):
az storage blob download \
--container-name tiering-lab \
--name "hot/sample-data.txt" \
--file downloaded-hot.txt \
--account-name "$STORAGE_ACCOUNT" \
--account-key "$STORAGE_KEY"
cat downloaded-hot.txt
Now attempt to download from Archive tier (FAILS):
az storage blob download \
--container-name tiering-lab \
--name "archive/sample-data.txt" \
--file downloaded-archive.txt \
--account-name "$STORAGE_ACCOUNT" \
--account-key "$STORAGE_KEY" 2>&1 || true
The download fails with an error indicating the blob is in an offline tier. Archive blobs cannot be read directly -- they must be rehydrated first.
This is the single most important behavioral difference in Azure Storage tiering: Archive is OFFLINE storage. It is not "slow storage" -- it is inaccessible storage that requires an explicit rehydration operation taking hours. On the AZ-305 exam, if a scenario requires "immediate access to compliance data during an audit," Archive tier is the WRONG answer regardless of its cost savings. The exam tests whether you understand that Archive introduces hours of latency, not just higher per-operation costs.
Step 5: initiate rehydration and check status
Start rehydration with High priority:
az storage blob set-tier \
--container-name tiering-lab \
--name "archive/sample-data.txt" \
--tier Hot \
--rehydrate-priority High \
--account-name "$STORAGE_ACCOUNT" \
--account-key "$STORAGE_KEY"
Check the rehydration status:
az storage blob show \
--container-name tiering-lab \
--name "archive/sample-data.txt" \
--account-name "$STORAGE_ACCOUNT" \
--account-key "$STORAGE_KEY" \
--query "{tier:properties.blobTier, rehydrationStatus:properties.rehydrationStatus, archiveStatus:properties.archiveStatus}" \
-o table
The blob shows rehydrate-pending-to-hot status. High priority rehydration can complete in under 1 hour for blobs under 10GB; standard priority takes up to 15 hours.
Rehydration is not instant even with High priority. Designing for Archive tier means designing for eventual access -- you need a process to handle the delay. Common patterns include: (1) keeping a metadata index in Hot tier so you know WHAT is archived without reading it, (2) maintaining a "last 30 days" copy in Cool tier for recent compliance queries, and (3) triggering rehydration proactively when an audit is announced rather than when data is requested.
Step 6: apply a lifecycle management policy based on last access time
az storage account management-policy create \
--account-name "$STORAGE_ACCOUNT" \
--resource-group rg-az305-challenge20 \
--policy '{
"rules": [
{
"enabled": true,
"name": "auto-tier-by-access",
"type": "Lifecycle",
"definition": {
"actions": {
"baseBlob": {
"tierToCool": {
"daysAfterLastAccessTimeGreaterThan": 30
},
"tierToCold": {
"daysAfterLastAccessTimeGreaterThan": 90
},
"tierToArchive": {
"daysAfterLastAccessTimeGreaterThan": 180
}
}
},
"filters": {
"blobTypes": ["blockBlob"],
"prefixMatch": ["tiering-lab/"]
}
}
},
{
"enabled": true,
"name": "cleanup-temp-blobs",
"type": "Lifecycle",
"definition": {
"actions": {
"baseBlob": {
"delete": {
"daysAfterLastAccessTimeGreaterThan": 7
}
}
},
"filters": {
"blobTypes": ["blockBlob"],
"prefixMatch": ["tiering-lab/temp/"]
}
}
}
]
}'
Step 7: verify the policy was applied
az storage account management-policy show \
--account-name "$STORAGE_ACCOUNT" \
--resource-group rg-az305-challenge20 \
--query "policy.rules[].{name:name, enabled:enabled, tierToCool:definition.actions.baseBlob.tierToCool, tierToArchive:definition.actions.baseBlob.tierToArchive, delete:definition.actions.baseBlob.delete}" \
-o table
echo "Policy rules applied:"
az storage account management-policy show \
--account-name "$STORAGE_ACCOUNT" \
--resource-group rg-az305-challenge20 \
--query "policy.rules[].name" -o tsv
Lifecycle policies execute automatically without any application changes. The application continues to upload and read blobs normally -- Azure Storage moves them between tiers in the background based on access patterns. This is "infrastructure-level cost optimization" that requires zero code changes. On the exam, lifecycle management is the correct answer whenever the question mentions "automatically reduce storage costs over time" or "data that becomes less frequently accessed."
This lab proved three storage architecture principles: (1) Archive tier is genuinely offline storage -- reads fail until rehydration completes, making it unsuitable for data that might need immediate access. (2) Last-access-time tracking enables intelligent, access-pattern-based tiering rather than crude age-based rules. (3) Lifecycle management policies automate cost optimization at the infrastructure level with zero application code changes -- blobs move between tiers automatically based on observed behavior.
Cleanup
rm -f sample-data.txt downloaded-hot.txt downloaded-archive.txt
az group delete \
--name rg-az305-challenge20 \
--yes --no-wait