2.4.3. S3 Performance: Transfer Acceleration, Multipart Uploads, and Lifecycle
💡 First Principle: S3 is not just object storage — it's a globally distributed system with multiple performance optimization levers. The right combination depends on whether your bottleneck is upload speed, download latency, storage cost, or data retrieval frequency.
Request Rate Performance: S3 scales horizontally based on key prefixes. Each prefix supports 3,500 PUT/COPY/POST/DELETE requests and 5,500 GET/HEAD requests per second. For workloads exceeding these limits, use randomized key prefixes to distribute requests across multiple partitions.
❌ Anti-pattern:
logs/2025/01/01/server1.log,logs/2025/01/01/server2.log(same prefix, requests concentrate) âś… Better:a1b2c3-server1-2025-01-01.log,d4e5f6-server2-2025-01-01.log(randomized prefix, requests distribute)
S3 Transfer Acceleration: Routes uploads through AWS CloudFront edge locations using optimized network paths. Instead of uploading directly from your location to the S3 region, your upload goes to the nearest edge location and then travels through AWS's internal backbone. Best for:
- Large files (>1GB)
- Uploads from geographically distant clients
- Cross-continent transfers
Transfer Acceleration uses a special endpoint: bucket-name.s3-accelerate.amazonaws.com
Multipart Upload: Required for objects >5GB; recommended for objects >100MB. Benefits:
- Improved throughput (parallel part uploads)
- Ability to restart individual failed parts without restarting the entire upload
- Begin uploading before you know the final file size
⚠️ Incomplete multipart uploads accumulate cost. If a multipart upload is initiated but never completed (e.g., application crash), the parts remain in S3 and you're billed for them. Use an S3 Lifecycle policy to automatically abort incomplete multipart uploads after a defined period (e.g., 7 days).
S3 Lifecycle Policies automate transitions between storage classes based on object age or other criteria:
| Storage Class | Access Pattern | Min Storage Duration | Cost Relative to S3 Standard |
|---|---|---|---|
| S3 Standard | Frequent access | None | Baseline |
| S3 Standard-IA | Infrequent, but rapid access needed | 30 days | Cheaper storage, retrieval fee |
| S3 One Zone-IA | Infrequent, single AZ acceptable | 30 days | Cheapest IA option |
| S3 Glacier Instant Retrieval | Archive with ms retrieval | 90 days | Low storage, ms retrieval |
| S3 Glacier Flexible Retrieval | Archive, minutes–hours retrieval | 90 days | Very low storage |
| S3 Glacier Deep Archive | Long-term archive, 12h retrieval | 180 days | Lowest storage cost |
| S3 Intelligent-Tiering | Unknown or changing patterns | None (no retrieval fee) | Monitoring fee per object |
AWS DataSync is a managed service for large-scale data transfer between on-premises storage and S3 (or EFS, FSx). It's faster than using the AWS CLI or custom scripts because it handles parallelization, checksumming, and retry automatically. Use DataSync for migration projects or continuous sync of on-premises data to AWS.
⚠️ Exam Trap: S3 Standard-IA and Glacier Flexible Retrieval have minimum storage duration charges. If you store an object for 15 days and delete it, you're still billed for 30 days (Standard-IA) or 90 days (Glacier). The exam may test whether you know when these classes are cost-effective.
Reflection Question: An application generates 10,000 log files per second. After 7 days, logs are queried rarely; after 90 days, they're almost never accessed; after 365 days, they must be retained for compliance but never accessed. Design the S3 Lifecycle policy.