3.2.1. Amazon OpenSearch, Neptune, DocumentDB, Keyspaces, and MemoryDB
š” First Principle: Each specialized database exists because a common access pattern is poorly served by general-purpose databases. Understanding the access pattern each service is built for lets you match exam scenarios to the right service, even if you haven't used the service directly.
Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) ā for full-text search, log analytics, and observability. Ingests structured and unstructured data, indexes it for fast retrieval, and supports complex search queries with relevance scoring, aggregations, and dashboards (OpenSearch Dashboards). The data engineering use case: centralized log analysis where CloudWatch Logs, CloudTrail, and application logs flow into OpenSearch for interactive investigation.
Amazon Neptune ā for graph databases. Models data as nodes and edges (relationships). Query with Gremlin (property graph) or SPARQL (RDF). Use cases: social networks, fraud detection (finding connected suspicious accounts), knowledge graphs, and recommendation engines. If an exam question describes finding relationships between entities (users, transactions, devices), Neptune is the signal.
Amazon DocumentDB ā MongoDB-compatible document database for JSON workloads. Use when applications already use the MongoDB API and need a managed service on AWS. Not a common exam topic, but may appear when a question mentions "MongoDB workloads on AWS."
Amazon Keyspaces ā Apache Cassandra-compatible service for wide-column workloads. Use when existing applications use the CQL (Cassandra Query Language) API. Serverless, with on-demand and provisioned capacity modes.
Amazon MemoryDB for Redis ā Redis-compatible, durable, in-memory database. Unlike ElastiCache (primarily a cache), MemoryDB provides both microsecond read latency and data durability (multi-AZ transaction log). Use when the application needs Redis data structures with database-grade durability.
ā ļø Exam Trap: OpenSearch and Athena both support querying data, but for completely different patterns. Athena queries structured data in S3 using SQL. OpenSearch indexes and searches unstructured or semi-structured data (logs, text documents) using its query DSL. If a question mentions "search across log messages" or "interactive log investigation," the answer is OpenSearch, not Athena.
Reflection Question: A fraud detection system needs to trace relationships between bank accounts, transactions, devices, and IP addresses to identify suspicious clusters. Which specialized data store is purpose-built for this?