3.1.1. Vector Database Options on AWS
💡 First Principle: No single vector database is optimal for all RAG use cases — the right choice depends on your existing infrastructure, query volume, metadata filtering needs, vector dimensionality, update frequency, and whether you need exact or approximate nearest-neighbor search.
AWS vector database comparison:
| Service | Vector Search Type | Best For | Metadata Filtering | Update Latency | Cost Profile |
|---|---|---|---|---|---|
| Amazon OpenSearch Service | Approximate (HNSW, FAISS) + Exact | High-throughput, complex filtering, custom indexing | Full-text + structured | Near-real-time | Medium-High |
| Amazon Aurora (pgvector) | Exact + Approximate (IVFFlat, HNSW) | Existing PostgreSQL workloads; ACID transactions | SQL WHERE clause | Transactional | Medium |
| Amazon DynamoDB + OpenSearch | OpenSearch handles vectors; DynamoDB handles metadata | High-scale metadata + embedding separation | Complex attribute queries | Near-real-time | Medium |
| Bedrock Knowledge Bases (managed) | OpenSearch Serverless or Pinecone backend | Standard RAG without custom pipeline | Metadata attributes | Minutes (sync job) | Low operational overhead |
| Amazon MemoryDB for Redis | Exact k-NN | Ultra-low latency semantic cache | Limited | Real-time | High |
OpenSearch with the Neural plugin — the exam-critical configuration: When using OpenSearch for Bedrock Knowledge Bases integration, the Neural plugin enables k-NN vector search. The critical configuration decisions:
- k-NN algorithm: HNSW (Hierarchical Navigable Small World) for high recall; IVF for lower memory footprint at scale
- ef_construction / m parameters: Higher = better recall, slower indexing, more memory
- Sharding strategy: Shard count affects parallelism; too few shards → bottleneck; too many → overhead
# OpenSearch index mapping for vector search
index_mapping = {
"settings": {
"index": {
"knn": True,
"knn.algo_param.ef_search": 512 # Higher = better recall, slower queries
}
},
"mappings": {
"properties": {
"embedding": {
"type": "knn_vector",
"dimension": 1024, # Must match Titan Embeddings v2 output
"method": {
"name": "hnsw",
"engine": "faiss",
"parameters": {"ef_construction": 512, "m": 16}
}
},
"content": {"type": "text"},
"document_id": {"type": "keyword"},
"last_updated": {"type": "date"},
"department": {"type": "keyword"} # Enable filtered retrieval
}
}
}
⚠️ Exam Trap: The pgvector extension on Aurora PostgreSQL supports exact nearest-neighbor search by default, which guarantees 100% recall but scales poorly for large corpora (>1M vectors). For production-scale RAG, you must enable IVFFlat or HNSW indexes on pgvector to switch to approximate search — and ANN indexes require periodic re-building as data grows.
Reflection Question: Your team is building a RAG system for a legal research firm with 2 million case documents. They need to filter results by jurisdiction, case year, and practice area in addition to semantic similarity. They have an existing PostgreSQL expertise on the team. Which vector store would you recommend, and what index configuration consideration would you flag?