6.2.3. Lifecycle Management and Retention
First Principle: Data that persists beyond its useful life is both a cost burden and a security liability. Lifecycle management automates the transition of data through storage tiers and its eventual deletion, ensuring data exists only as long as it's needed.
S3 Lifecycle Policies:
- Transition objects between storage classes based on age (Standard → IA → Glacier → Delete)
- Apply to entire buckets, specific prefixes, or tag-filtered objects
- Critical for cost optimization AND security (reducing volume of data at risk)
Amazon EFS Lifecycle Policies:
- Automatically move infrequently accessed files to EFS Infrequent Access (IA) storage
- Configurable based on last access time (7, 14, 30, 60, 90 days)
Amazon FSx for Lustre Backup Policies:
- Automated daily backups with configurable retention
- Backup to S3 for long-term retention
- Critical for HPC and ML workloads using temporary file systems
Retention Best Practices:
| Data Type | Retention Strategy | AWS Mechanism |
|---|---|---|
| CloudTrail logs | 7 years (common compliance) | S3 lifecycle: Standard 90 days → Glacier 7 years → Delete |
| Application logs | 30-90 days | CloudWatch Logs retention setting |
| Database backups | 35 days + long-term snapshots | RDS automated backups + manual snapshot lifecycle |
| Sensitive PII | Minimum required by regulation | S3 lifecycle with Object Lock for compliance period |
⚠️ Exam Trap: S3 Lifecycle transitions and Object Lock retention periods operate independently. An object can transition to Glacier while still being retained by Object Lock. The transition changes the storage class; the lock prevents deletion.
Scenario: A company stores customer data in S3 with a 3-year regulatory retention requirement. You configure Object Lock (Compliance, 3 years) for retention, S3 Lifecycle to transition to Glacier IA after 90 days for cost savings, and automatic deletion after 3 years when the retention period expires.
Reflection Question: How do lifecycle policies and retention policies work together, and why are they both necessary for compliant, cost-effective data management?