9. Glossary
Action Group — A set of tools (Lambda functions) exposed to a Bedrock Agent via an OpenAPI schema. The agent decides when to call which action based on the schema descriptions.
Approximate Nearest Neighbor (ANN) — A class of algorithms (HNSW, IVF, FAISS) that find approximately similar vectors efficiently at scale, trading some recall accuracy for dramatically faster search compared to exact k-NN.
Automated Reasoning Check — A Bedrock Guardrails feature that formally verifies FM outputs against a defined set of logical rules or policies, providing higher-confidence grounding than semantic similarity alone.
Bedrock Agents — A fully managed AWS service that orchestrates multi-step FM reasoning loops with tool use, session memory, and knowledge base retrieval without custom orchestration code.
Bedrock Data Automation — A managed service for extracting structured information from multimodal documents (PDFs, images, audio, video) at scale, replacing custom extraction pipelines.
Bedrock Guardrails — A managed content safety layer that evaluates FM inputs and outputs against defined policies: topic denial, content filters, word filters, PII redaction, and grounding checks.
Bedrock Knowledge Bases — A fully managed RAG service that handles document ingestion, chunking, embedding, indexing, and retrieval as a single managed pipeline.
Bedrock Model Evaluations — A managed service for running systematic quality evaluations of FM outputs against golden datasets using automated metrics and LLM-as-Judge scoring.
Bedrock Prompt Flows — A visual, low-code orchestration service for building multi-step FM workflows with branching, Lambda integration, and Knowledge Base retrieval steps.
Bedrock Prompt Management — A versioned prompt template management service with parameterized variables, immutable versions, and IAM-controlled access for governing production prompts.
BERTScore — An evaluation metric measuring semantic similarity between generated and reference text using BERT embeddings, more appropriate than BLEU for GenAI output evaluation.
Chain-of-Thought (CoT) — A prompting technique instructing the FM to explicitly reason step-by-step before producing a final answer, improving accuracy on multi-step reasoning tasks.
Chunking — The process of splitting documents into smaller segments for vector indexing. Strategies include fixed-size, sentence-based, hierarchical, and semantic chunking.
Context Window — The maximum number of tokens an FM can process in a single inference call, encompassing system prompt, conversation history, retrieved context, and user input combined.
Continued Pre-Training — A customization technique that trains an FM on a large domain corpus to deeply internalize domain knowledge, modifying the model's knowledge base rather than its behavior.
Cross-Region Inference — A Bedrock feature that automatically routes FM invocations to alternative regions when the primary region lacks capacity or the model is unavailable.
Defense-in-Depth — A security architecture principle requiring multiple independent protective layers such that bypassing one layer still leaves others active.
Embedding Model — A model that converts text (or other modalities) into dense numerical vectors in a high-dimensional semantic space, enabling similarity-based search.
EventBridge — An AWS serverless event bus that routes events from sources (S3, Salesforce, custom) to targets (Lambda, SQS, Step Functions), enabling event-driven GenAI architectures.
Fine-Tuning — A customization technique that modifies an FM's output probability distribution by training on labeled examples, changing how the model responds without changing what it knows.
Foundation Model (FM) — A large-scale pre-trained model trained on broad data that can be adapted to many downstream tasks. Examples: Claude 3, Titan Text, Llama 3.
GenAI Gateway — A centralized application layer providing authentication, rate limiting, cost attribution, model routing, and safety enforcement for all GenAI API calls across an organization.
Grounding — The property of an FM response being factually supported by retrieved context. Low grounding = hallucination risk.
Hallucination — The FM producing confident, fluent, semantically coherent text that is factually incorrect or unsupported by provided context.
HNSW (Hierarchical Navigable Small World) — An approximate nearest-neighbor graph algorithm used for efficient vector search, configured by ef_construction (build quality) and ef_search (query recall vs. speed).
Human-in-the-Loop — An architectural pattern where AI-generated decisions require human approval before execution, implemented via Step Functions waitForTaskToken.
HyDE (Hypothetical Document Embeddings) — A query transformation technique that generates a hypothetical answer using an FM and uses that answer's embedding as the search query, bridging query-document vocabulary gaps.
Indirect Prompt Injection — A prompt injection attack where malicious instructions are embedded in retrieved documents rather than user input, causing the FM to execute the embedded instructions.
k-NN (k-Nearest Neighbor) — A search algorithm retrieving the k most similar vectors to a query vector. Exact k-NN guarantees 100% recall but scales poorly; approximate (ANN) algorithms trade recall for speed.
Knowledge Base Sync — The process of ingesting new or updated documents into a Bedrock Knowledge Base. Not real-time — may take minutes to hours for large corpora.
LoRA (Low-Rank Adaptation) — A parameter-efficient fine-tuning technique that adds small trainable rank-decomposition matrices to a frozen base model, dramatically reducing fine-tuning compute cost.
LLM-as-Judge — An evaluation technique using a capable FM to score other FM outputs against defined criteria, scaling human-like quality assessment to large evaluation sets.
MCP (Model Context Protocol) — An open protocol developed by Anthropic that standardizes the interface between AI models and external tools/data sources, enabling interoperable tool integration across agent frameworks.
Model Card — Documentation for an FM deployment describing intended use, training data, evaluation results, known limitations, and responsible AI considerations.
Model Context Protocol — See MCP.
Model Invocation Logging — A Bedrock feature that records every FM request and response to CloudWatch Logs and/or S3, providing the primary forensic audit trail for production GenAI applications.
Multi-Agent Architecture — A system where multiple specialized AI agents collaborate, with a supervisor agent routing tasks to specialized sub-agents. Implemented with AWS Agent Squad.
Provisioned Throughput — A Bedrock pricing model where you purchase reserved model units providing dedicated capacity and consistent latency, cost-effective above ~60–70% utilization.
Prompt Caching — A Bedrock feature that caches the key-value representations of static prompt prefixes (system prompts, documents), eliminating re-processing cost for repeated identical content.
Prompt Injection — An attack where adversarial instructions embedded in user input or retrieved documents override the system prompt, causing the FM to execute unintended behavior.
RAG (Retrieval-Augmented Generation) — An architecture pattern that retrieves relevant documents from a knowledge base and injects them into the FM's context, grounding responses in factual retrieved content rather than model weights alone.
RAGAS — A framework providing four independent metrics for evaluating RAG pipelines: Faithfulness, Answer Relevancy, Context Precision, and Context Recall.
ReAct — A reasoning pattern interleaving Thought (explicit reasoning), Action (tool call), and Observation (tool result) in a loop until the FM has sufficient information to produce a final answer.
Reranker — A model that re-scores initially retrieved chunks based on their specific relevance to the query, improving retrieval precision after the initial approximate k-NN search.
Semantic Caching — An application-layer caching strategy that stores FM responses indexed by query embedding, returning cached responses for semantically similar future queries.
Semantic Chunking — A document splitting strategy that uses an FM to detect topic shifts and splits at semantic boundaries rather than fixed token counts.
Strands Agents — An open-source, Python-first AWS agent framework providing code-level control over agent reasoning, tool integration, and multi-agent orchestration.
Transfer Question — An exam question testing a concept in a novel scenario not covered in the study guide, evaluating whether knowledge generalizes beyond memorized examples.
VPC Endpoint (PrivateLink) — An AWS networking service that routes traffic to AWS services (including Bedrock) through the private AWS network, preventing data from traversing the public internet.
waitForTaskToken — A Step Functions integration pattern that pauses a state machine execution until an external process (human reviewer, approval workflow) sends a SendTaskSuccess or SendTaskFailure callback.