3.4. Data Lifecycle Management
š” First Principle: Data costs money to store, and its value changes over time. Last week's transaction data is queried constantly; last year's is queried monthly; data from five years ago is kept only for compliance. Without lifecycle management, organizations pay hot-storage prices for cold data ā like a company renting a downtown office to store documents nobody reads. Lifecycle policies automatically move data through storage tiers as it ages, matching cost to value.
Consider a company storing 5 years of raw logs in S3 Standard ā they pay $120,000/year when moving data older than 90 days to IA and archiving beyond 1 year to Deep Archive would cut costs by 70%. Unlike compute costs that scale with usage, storage costs grow silently until someone audits the bill.
Ignoring lifecycle management silently erodes your cloud budget. Consider a data lake that grows by 1 TB per month in S3 Standard ($0.023/GB). After three years, that's 36 TB costing $828/month ā but 90% of it nobody queried in over a year. Moving that 90% to S3 Glacier Instant Retrieval ($0.004/GB) saves $617/month. How much of your data lake budget is wasted on hot-tier pricing for data nobody queries?
The exam tests lifecycle management from two angles: cost optimization (moving data to cheaper tiers) and compliance (retention policies, legal holds, deletion requirements). Both require understanding S3 storage classes, DynamoDB TTL, and Redshift data management.