Challenge 20: design Data Storage for cost and performance

Estimated Time and Cost

60-90 min | Estimated cost: $2-5 | Exam Weight: 20-25%

Introduction

DataForge Analytics is a fast-growing AI startup that has outgrown its initial storage architecture. Today they manage 100TB of data across Azure, with projections reaching 500TB within 12 months. Their data falls into three distinct usage patterns: ML training datasets accessed hourly by GPU clusters (hot data), user-uploaded files accessed daily through their SaaS platform (warm data), and compliance archives that must be retained for 7 years but are rarely accessed (cold data).

The CFO has raised an urgent concern: the current monthly storage bill is $15,000 and growing linearly with data volume. The target is to reduce costs below $10,000/month without sacrificing read performance on hot data that feeds the ML pipeline. The ML team reports that any latency increase on training data reads directly impacts model training time and GPU utilization efficiency.

Your task is to design a tiered storage strategy that balances cost optimization with performance requirements, leveraging Azure Storage access tiers, reserved capacity pricing, lifecycle management policies, and caching layers where appropriate.

Exam skills covered

Recommend a data storage solution to balance features, performance, and costs

Design tasks

Part 1: analyze current Storage and define tier strategy

Create a resource group for this challenge and deploy a Standard general-purpose v2 storage account.
Document the current pricing for each access tier (Hot, Cool, Cold, Archive) including per-GB storage costs, read/write operation costs, and data retrieval costs in your chosen region.
Design a tiering strategy that maps each data category to the appropriate access tier:
- ML training datasets (10TB, accessed hourly) - evaluate Hot tier vs Premium block blob storage
- User uploads (30TB, accessed 1-5 times daily) - evaluate Cool vs Hot tier
- Compliance archives (60TB, accessed less than once per year) - evaluate Cold vs Archive tier
Calculate the projected monthly cost for your proposed tier allocation versus keeping everything in Hot tier.

Part 2: implement lifecycle Management Policies

Create a lifecycle management policy that automatically transitions blobs between tiers based on last access time:
- Move blobs not accessed for 30 days from Hot to Cool
- Move blobs not accessed for 90 days from Cool to Cold
- Move blobs not accessed for 180 days from Cold to Archive
Enable last access time tracking on the storage account to support access-time-based policies.
Create a second policy rule that deletes temporary processing blobs (prefix: temp/) after 7 days.

Part 3: evaluate reserved capacity and caching

Calculate the savings from purchasing 100TB of Azure Storage reserved capacity (1-year commitment) versus pay-as-you-go pricing for the stable baseline storage.
Design a caching strategy for the ML training data using Azure Cache for Redis or Azure HPC Cache. Document:
- Which caching solution is appropriate for large dataset reads
- Expected cache hit ratio for repeatedly-accessed training datasets
- Cost of the caching layer versus the performance benefit
Create a decision matrix comparing Standard vs Premium storage account performance tiers for the ML workload, considering IOPS, throughput, and latency requirements.

Part 4: design for growth

Document how your design scales from 100TB to 500TB while maintaining the $10K/month budget constraint.
Design a monitoring solution using Azure Monitor metrics to track:
- Storage capacity growth per container
- Access patterns per tier (to validate lifecycle policy effectiveness)
- Cost alerts when monthly spend approaches budget threshold

Success criteria

⬜Lifecycle management policy deployed with at least 3 tier transition rules based on last access time
⬜Cost analysis document shows projected savings of 30% or more compared to all-Hot storage
⬜Decision matrix compares Standard vs Premium tiers with IOPS, throughput, latency, and cost columns
⬜Reserved capacity calculation demonstrates break-even point for 1-year commitment
⬜Caching strategy documented with solution selection rationale and cost-benefit analysis
⬜Growth plan shows cost remains under $10K/month at 500TB scale

Hints

Hint 1: Understanding Access Tier Pricing

Azure Blob Storage access tiers have an inverse relationship between storage cost and access cost. Hot tier has higher per-GB storage cost but lower read/write operation costs. Archive tier has the lowest storage cost (roughly 1/20th of Hot) but high retrieval costs and 15-hour rehydration latency. The Cold tier (introduced after Cool) offers pricing between Cool and Archive with lower retrieval costs than Archive.

Hint 2: Lifecycle Management Policy Structure

Lifecycle management policies use JSON rules with baseBlob actions. Enable enableAutoTierToHotFromCool if you want Azure to automatically move blobs back to Hot when accessed. Use daysAfterLastAccessTimeGreaterThan (requires access tracking enabled) rather than daysAfterModificationGreaterThan for access-pattern-based tiering.

Hint 3: Reserved Capacity Considerations

Azure Storage reserved capacity provides up to 38% discount for 1-year and up to 56% for 3-year commitments on block blob storage capacity. The reservation applies to the total storage amount regardless of tier. It does not cover transaction costs, data transfer, or operations - only the per-GB capacity charge.

Hint 4: Caching for Large Datasets

For ML training workloads reading large datasets (multi-TB), Azure HPC Cache is designed for high-throughput file-based workloads and can cache data from Azure Blob Storage. Azure Cache for Redis is better suited for smaller, key-value lookups. Consider whether the ML framework supports file-based reads (HPC Cache) or object-based reads (Redis).

Hint 5: Premium Block Blob Storage

Premium block blob storage accounts use SSDs and are optimized for workloads requiring consistent low latency and high transaction rates. They only support the Hot tier (no lifecycle tiering) and cost significantly more per GB. They are best when you need sub-millisecond latency, not just high throughput.

Learning resources

Knowledge check

1. A company stores 50TB of log data that is written once and read approximately twice per month for compliance audits. Which access tier minimizes total cost (storage + operations)?

Cool tier. While Archive has the lowest per-GB storage cost, the twice-monthly read pattern would incur significant retrieval costs and 15-hour rehydration delays. Cold tier might also work, but Cool provides a good balance between storage cost savings (roughly 50% less than Hot) and reasonable operation costs for occasional reads. The key insight is that Archive is only cost-effective when data is accessed less than once or twice per year.

2. When does Azure Storage reserved capacity NOT provide cost savings?

When storage volume is highly variable or shrinking. Reserved capacity requires a commitment to a fixed amount of storage (100TB or 1PB increments). If actual usage falls below the reserved amount, you pay for unused capacity. It also does not cover transaction costs, egress, or operations - only the per-GB capacity charge. If your workload is transaction-heavy but storage-light, reserved capacity provides minimal benefit.

3. A lifecycle management policy moves blobs to Archive after 180 days. A user needs to read an archived blob immediately. What happens?

The read fails until the blob is rehydrated. Archived blobs are offline and cannot be read directly. The user must first rehydrate the blob by changing its tier to Hot, Cool, or Cold (standard priority takes up to 15 hours; high priority may complete in under 1 hour for blobs under 10GB). Alternatively, they can copy the blob to a new blob in an online tier. This is a critical design consideration - if any compliance data might need urgent access, Archive tier may not be appropriate without a documented rehydration process.

4. What is the primary difference between a Standard general-purpose v2 storage account and a Premium block blob storage account for read-heavy workloads?

Latency consistency and IOPS. Premium block blob storage uses SSDs and provides consistent single-digit millisecond latency and higher IOPS. Standard accounts use HDDs with variable latency (typically 5-10ms but can spike). Premium is priced per GB (no access tiers) and costs 2-3x more per GB than Standard Hot tier. The design decision depends on whether the workload requires consistent low latency (Premium) or can tolerate variable latency in exchange for tier-based cost optimization (Standard).

Validation lab

This lab demonstrates that Azure Storage access tiers are not just pricing categories -- they produce fundamentally different behavior. Archive tier is genuinely offline (reads fail), last-access-time tracking enables intelligent automation, and lifecycle policies operate without application changes. You will observe these behaviors directly.

Step 1: create a storage account with last-access-time tracking

az group create \
  --name rg-az305-challenge20 \
  --location eastus

STORAGE_ACCOUNT="stch20${RANDOM}"

az storage account create \
  --name "$STORAGE_ACCOUNT" \
  --resource-group rg-az305-challenge20 \
  --sku Standard_LRS \
  --kind StorageV2 \
  --access-tier Hot

Enable last-access-time tracking (required for access-time-based lifecycle policies):

az storage account blob-service-properties update \
  --account-name "$STORAGE_ACCOUNT" \
  --resource-group rg-az305-challenge20 \
  --enable-last-access-tracking true

STORAGE_KEY=$(az storage account keys list \
  --account-name "$STORAGE_ACCOUNT" \
  --resource-group rg-az305-challenge20 \
  --query "[0].value" -o tsv)

Create a container for the experiment:

az storage container create \
  --name tiering-lab \
  --account-name "$STORAGE_ACCOUNT" \
  --account-key "$STORAGE_KEY"

Step 2: upload the same file to hot, cool, and archive tiers

Create a sample file:

echo "This is sample ML training data for tier comparison testing." > sample-data.txt

Upload to Hot tier:

az storage blob upload \
  --container-name tiering-lab \
  --name "hot/sample-data.txt" \
  --file sample-data.txt \
  --account-name "$STORAGE_ACCOUNT" \
  --account-key "$STORAGE_KEY" \
  --tier Hot

Upload to Cool tier:

az storage blob upload \
  --container-name tiering-lab \
  --name "cool/sample-data.txt" \
  --file sample-data.txt \
  --account-name "$STORAGE_ACCOUNT" \
  --account-key "$STORAGE_KEY" \
  --tier Cool

Upload to Archive tier:

az storage blob upload \
  --container-name tiering-lab \
  --name "archive/sample-data.txt" \
  --file sample-data.txt \
  --account-name "$STORAGE_ACCOUNT" \
  --account-key "$STORAGE_KEY" \
  --tier Archive

Step 3: compare blob properties across tiers

echo "=== Hot Tier Blob ==="
az storage blob show \
  --container-name tiering-lab \
  --name "hot/sample-data.txt" \
  --account-name "$STORAGE_ACCOUNT" \
  --account-key "$STORAGE_KEY" \
  --query "{tier:properties.blobTier, lastAccessed:properties.lastAccessedOn, contentLength:properties.contentLength}" \
  -o table

echo "=== Cool Tier Blob ==="
az storage blob show \
  --container-name tiering-lab \
  --name "cool/sample-data.txt" \
  --account-name "$STORAGE_ACCOUNT" \
  --account-key "$STORAGE_KEY" \
  --query "{tier:properties.blobTier, lastAccessed:properties.lastAccessedOn, contentLength:properties.contentLength}" \
  -o table

echo "=== Archive Tier Blob ==="
az storage blob show \
  --container-name tiering-lab \
  --name "archive/sample-data.txt" \
  --account-name "$STORAGE_ACCOUNT" \
  --account-key "$STORAGE_KEY" \
  --query "{tier:properties.blobTier, lastAccessed:properties.lastAccessedOn, contentLength:properties.contentLength}" \
  -o table

Architect Insight

All three blobs have the same content and size, but the tier metadata differs. The storage cost per GB varies dramatically: Hot is roughly 20x more expensive per GB than Archive. However, the trade-off is access behavior -- as you will see in the next step, Archive tier is not just "cheaper storage" but fundamentally offline storage.

Step 4: attempt to download from archive tier -- observe failure

Download from Hot tier (succeeds instantly):

az storage blob download \
  --container-name tiering-lab \
  --name "hot/sample-data.txt" \
  --file downloaded-hot.txt \
  --account-name "$STORAGE_ACCOUNT" \
  --account-key "$STORAGE_KEY"

cat downloaded-hot.txt

Now attempt to download from Archive tier (FAILS):

az storage blob download \
  --container-name tiering-lab \
  --name "archive/sample-data.txt" \
  --file downloaded-archive.txt \
  --account-name "$STORAGE_ACCOUNT" \
  --account-key "$STORAGE_KEY" 2>&1 || true

The download fails with an error indicating the blob is in an offline tier. Archive blobs cannot be read directly -- they must be rehydrated first.

Architect Insight

This is the single most important behavioral difference in Azure Storage tiering: Archive is OFFLINE storage. It is not "slow storage" -- it is inaccessible storage that requires an explicit rehydration operation taking hours. On the AZ-305 exam, if a scenario requires "immediate access to compliance data during an audit," Archive tier is the WRONG answer regardless of its cost savings. The exam tests whether you understand that Archive introduces hours of latency, not just higher per-operation costs.

Step 5: initiate rehydration and check status

Start rehydration with High priority:

az storage blob set-tier \
  --container-name tiering-lab \
  --name "archive/sample-data.txt" \
  --tier Hot \
  --rehydrate-priority High \
  --account-name "$STORAGE_ACCOUNT" \
  --account-key "$STORAGE_KEY"

Check the rehydration status:

az storage blob show \
  --container-name tiering-lab \
  --name "archive/sample-data.txt" \
  --account-name "$STORAGE_ACCOUNT" \
  --account-key "$STORAGE_KEY" \
  --query "{tier:properties.blobTier, rehydrationStatus:properties.rehydrationStatus, archiveStatus:properties.archiveStatus}" \
  -o table

The blob shows rehydrate-pending-to-hot status. High priority rehydration can complete in under 1 hour for blobs under 10GB; standard priority takes up to 15 hours.

Architect Insight

Rehydration is not instant even with High priority. Designing for Archive tier means designing for eventual access -- you need a process to handle the delay. Common patterns include: (1) keeping a metadata index in Hot tier so you know WHAT is archived without reading it, (2) maintaining a "last 30 days" copy in Cool tier for recent compliance queries, and (3) triggering rehydration proactively when an audit is announced rather than when data is requested.

Step 6: apply a lifecycle management policy based on last access time

az storage account management-policy create \
  --account-name "$STORAGE_ACCOUNT" \
  --resource-group rg-az305-challenge20 \
  --policy '{
    "rules": [
      {
        "enabled": true,
        "name": "auto-tier-by-access",
        "type": "Lifecycle",
        "definition": {
          "actions": {
            "baseBlob": {
              "tierToCool": {
                "daysAfterLastAccessTimeGreaterThan": 30
              },
              "tierToCold": {
                "daysAfterLastAccessTimeGreaterThan": 90
              },
              "tierToArchive": {
                "daysAfterLastAccessTimeGreaterThan": 180
              }
            }
          },
          "filters": {
            "blobTypes": ["blockBlob"],
            "prefixMatch": ["tiering-lab/"]
          }
        }
      },
      {
        "enabled": true,
        "name": "cleanup-temp-blobs",
        "type": "Lifecycle",
        "definition": {
          "actions": {
            "baseBlob": {
              "delete": {
                "daysAfterLastAccessTimeGreaterThan": 7
              }
            }
          },
          "filters": {
            "blobTypes": ["blockBlob"],
            "prefixMatch": ["tiering-lab/temp/"]
          }
        }
      }
    ]
  }'

Step 7: verify the policy was applied

az storage account management-policy show \
  --account-name "$STORAGE_ACCOUNT" \
  --resource-group rg-az305-challenge20 \
  --query "policy.rules[].{name:name, enabled:enabled, tierToCool:definition.actions.baseBlob.tierToCool, tierToArchive:definition.actions.baseBlob.tierToArchive, delete:definition.actions.baseBlob.delete}" \
  -o table

echo "Policy rules applied:"
az storage account management-policy show \
  --account-name "$STORAGE_ACCOUNT" \
  --resource-group rg-az305-challenge20 \
  --query "policy.rules[].name" -o tsv

Architect Insight

Lifecycle policies execute automatically without any application changes. The application continues to upload and read blobs normally -- Azure Storage moves them between tiers in the background based on access patterns. This is "infrastructure-level cost optimization" that requires zero code changes. On the exam, lifecycle management is the correct answer whenever the question mentions "automatically reduce storage costs over time" or "data that becomes less frequently accessed."

Design Validation

This lab proved three storage architecture principles: (1) Archive tier is genuinely offline storage -- reads fail until rehydration completes, making it unsuitable for data that might need immediate access. (2) Last-access-time tracking enables intelligent, access-pattern-based tiering rather than crude age-based rules. (3) Lifecycle management policies automate cost optimization at the infrastructure level with zero application code changes -- blobs move between tiers automatically based on observed behavior.

Cleanup

rm -f sample-data.txt downloaded-hot.txt downloaded-archive.txt

az group delete \
  --name rg-az305-challenge20 \
  --yes --no-wait

Next: Challenge 21: Design Data Durability and Protection

Introduction​

Exam skills covered​

Design tasks​

Part 1: analyze current Storage and define tier strategy​

Part 2: implement lifecycle Management Policies​

Part 3: evaluate reserved capacity and caching​

Part 4: design for growth​

Success criteria​

Hints​

Learning resources​

Knowledge check​

Validation lab​

Step 1: create a storage account with last-access-time tracking​

Step 2: upload the same file to hot, cool, and archive tiers​

Step 3: compare blob properties across tiers​

Step 4: attempt to download from archive tier -- observe failure​

Step 5: initiate rehydration and check status​

Step 6: apply a lifecycle management policy based on last access time​

Step 7: verify the policy was applied​

Cleanup​

Introduction

Exam skills covered

Design tasks

Part 1: analyze current Storage and define tier strategy

Part 2: implement lifecycle Management Policies

Part 3: evaluate reserved capacity and caching

Part 4: design for growth

Success criteria

Hints

Learning resources

Knowledge check

Validation lab

Step 1: create a storage account with last-access-time tracking

Step 2: upload the same file to hot, cool, and archive tiers

Step 3: compare blob properties across tiers

Step 4: attempt to download from archive tier -- observe failure

Step 5: initiate rehydration and check status

Step 6: apply a lifecycle management policy based on last access time

Step 7: verify the policy was applied

Cleanup