3.2.1.2. Storage Cost Optimization (Lifecycle Policies, Data Tiering)
š” First Principle: Maximizing compute cost efficiency requires dynamically matching capacity to actual workload demand and leveraging flexible pricing models to pay only for what is needed.
Scenario: A media streaming service stores video files in "Amazon S3"
. Recently uploaded videos are frequently accessed by users. After 60 days, their access frequency drops significantly but they still need to be available within minutes. After 1 year, videos are rarely accessed but must be retained for 10 years for archival purposes.
Storage costs can accumulate rapidly, especially for large datasets or long retention periods.
- "S3 Storage Classes":
- "S3 Standard": General-purpose, frequently accessed data (hot).
- "S3 Intelligent-Tiering": Automatically moves objects between frequent, infrequent, and archive access tiers based on changing access patterns, without performance impact. Ideal for unpredictable workloads.
- "S3 Standard-Infrequent Access (IA)": Data accessed less frequently but requiring rapid access when needed (warm). Higher retrieval cost, lower storage cost.
- "S3 One Zone-IA": Same as
"Standard-IA"
but stored in a single"AZ"
(less durable to"AZ"
failure). Lowest cost for infrequent access if data can be lost in an"AZ"
event. - "S3 Glacier": Archival data, very low cost, delayed retrieval.
- "S3 Glacier Deep Archive": Lowest cost archival, retrieval in hours (coldest).
- "S3 Lifecycle Policies": Automate the transition of objects between these
"S3 storage classes"
or their expiration.- Practical Relevance: Essential for managing data retention and cost by moving aging data to cheaper tiers and deleting outdated data.
- "EBS Volume Types": Selecting the most cost-effective
"EBS"
volume type (e.g.,gp3
often more cost-effective thangp2
) andright-sizing
the provisioned"IOPS"
/throughput. - "EFS Performance Modes": Choose between
"General Purpose"
and"Max I/O"
or"Bursting Throughput"
and"Provisioned Throughput"
based on actual needs to avoid overpaying for performance.
Visual: S3 Storage Cost Optimization Flow
Loading diagram...
ā ļø Common Pitfall: Using "S3 Lifecycle Policies"
for data with unpredictable access patterns. If an object is moved to an infrequent access tier and then suddenly becomes popular again, the retrieval costs can negate the storage savings. "S3 Intelligent-Tiering"
is the better choice for unpredictable access.
Key Trade-Offs:
- Storage Cost vs. Retrieval Fee: Infrequent access and archive tiers have very low storage costs but charge a per-GB fee for data retrieval, which can be expensive if access patterns are misjudged.
Reflection Question: How would you design a storage cost optimization strategy for a media streaming service using "Amazon S3"
storage classes and "S3 Lifecycle Policies"
to manage video files with varying access patterns (frequently, infrequently, rarely) and retention requirements (60 days, 1 year, 10 years), minimizing overall storage costs while meeting availability needs?