Copyright (c) 2025 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

2.1.4.2. DynamoDB Partition Keys & Indexing (GSI, LSI)

First Principle: Proper DynamoDB Partition Key design is crucial for even data distribution, enabling high throughput. Indexes (GSI, LSI) provide flexible query patterns beyond the primary key.

The primary key in Amazon DynamoDB (consisting of a Partition Key and optional Sort Key) is fundamental to how data is distributed and accessed. For queries beyond the primary key, secondary indexes are essential.

Key Concepts of DynamoDB Partition Keys & Indexing:
  • Partition Key (Hash Attribute):
    • Purpose: Determines the physical partition (storage location) for each item.
    • Importance: A good Partition Key distributes data evenly across partitions to prevent "hot spots" (partitions receiving disproportionately high traffic), which can lead to throttling and poor performance.
    • Example: For a "User" table, UserId as the Partition Key is good if user IDs are diverse.
  • Sort Key (Range Attribute):
    • Purpose: Defines the order of items within a partition. Allows for efficient range queries.
    • Example: In an "Order" table, CustomerId as Partition Key and OrderId as Sort Key allows querying all orders for a customer sorted by order ID.
  • Secondary Indexes: Provide alternative access patterns to your table data, allowing you to query on attributes other than the primary key.
    • Global Secondary Index (GSI): A new table with a different Partition Key (and optional Sort Key) than the base table. Can span all partitions of the base table. Used for frequently accessed, non-primary key attributes. Data is eventually consistent.
    • Local Secondary Index (LSI): Has the same Partition Key as the base table, but a different Sort Key. Queries are scoped to a single partition key value. Data is strongly consistent.

Scenario: You're designing a DynamoDB table to store e-commerce orders. You want to efficiently retrieve orders by OrderId (unique) and also query all orders for a given CustomerId (many orders per customer), and frequently look up orders by ProductSKU (many orders per SKU).

Reflection Question: How would you design the primary key (Partition Key and Sort Key) and implement secondary indexes (GSI and LSI) in DynamoDB to support these diverse query patterns and ensure efficient data distribution without creating hot spots?