Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

9. Glossary

Action Group — A set of tools (Lambda functions) exposed to a Bedrock Agent via an OpenAPI schema. The agent decides when to call which action based on the schema descriptions.

Approximate Nearest Neighbor (ANN) — A class of algorithms (HNSW, IVF, FAISS) that find approximately similar vectors efficiently at scale, trading some recall accuracy for dramatically faster search compared to exact k-NN.

Automated Reasoning Check — A Bedrock Guardrails feature that formally verifies FM outputs against a defined set of logical rules or policies, providing higher-confidence grounding than semantic similarity alone.

Bedrock Agents — A fully managed AWS service that orchestrates multi-step FM reasoning loops with tool use, session memory, and knowledge base retrieval without custom orchestration code.

Bedrock Data Automation — A managed service for extracting structured information from multimodal documents (PDFs, images, audio, video) at scale, replacing custom extraction pipelines.

Bedrock Guardrails — A managed content safety layer that evaluates FM inputs and outputs against defined policies: topic denial, content filters, word filters, PII redaction, and grounding checks.

Bedrock Knowledge Bases — A fully managed RAG service that handles document ingestion, chunking, embedding, indexing, and retrieval as a single managed pipeline.

Bedrock Model Evaluations — A managed service for running systematic quality evaluations of FM outputs against golden datasets using automated metrics and LLM-as-Judge scoring.

Bedrock Prompt Flows — A visual, low-code orchestration service for building multi-step FM workflows with branching, Lambda integration, and Knowledge Base retrieval steps.

Bedrock Prompt Management — A versioned prompt template management service with parameterized variables, immutable versions, and IAM-controlled access for governing production prompts.

BERTScore — An evaluation metric measuring semantic similarity between generated and reference text using BERT embeddings, more appropriate than BLEU for GenAI output evaluation.

Chain-of-Thought (CoT) — A prompting technique instructing the FM to explicitly reason step-by-step before producing a final answer, improving accuracy on multi-step reasoning tasks.

Chunking — The process of splitting documents into smaller segments for vector indexing. Strategies include fixed-size, sentence-based, hierarchical, and semantic chunking.

Context Window — The maximum number of tokens an FM can process in a single inference call, encompassing system prompt, conversation history, retrieved context, and user input combined.

Continued Pre-Training — A customization technique that trains an FM on a large domain corpus to deeply internalize domain knowledge, modifying the model's knowledge base rather than its behavior.

Cross-Region Inference — A Bedrock feature that automatically routes FM invocations to alternative regions when the primary region lacks capacity or the model is unavailable.

Defense-in-Depth — A security architecture principle requiring multiple independent protective layers such that bypassing one layer still leaves others active.

Embedding Model — A model that converts text (or other modalities) into dense numerical vectors in a high-dimensional semantic space, enabling similarity-based search.

EventBridge — An AWS serverless event bus that routes events from sources (S3, Salesforce, custom) to targets (Lambda, SQS, Step Functions), enabling event-driven GenAI architectures.

Fine-Tuning — A customization technique that modifies an FM's output probability distribution by training on labeled examples, changing how the model responds without changing what it knows.

Foundation Model (FM) — A large-scale pre-trained model trained on broad data that can be adapted to many downstream tasks. Examples: Claude 3, Titan Text, Llama 3.

GenAI Gateway — A centralized application layer providing authentication, rate limiting, cost attribution, model routing, and safety enforcement for all GenAI API calls across an organization.

Grounding — The property of an FM response being factually supported by retrieved context. Low grounding = hallucination risk.

Hallucination — The FM producing confident, fluent, semantically coherent text that is factually incorrect or unsupported by provided context.

HNSW (Hierarchical Navigable Small World) — An approximate nearest-neighbor graph algorithm used for efficient vector search, configured by ef_construction (build quality) and ef_search (query recall vs. speed).

Human-in-the-Loop — An architectural pattern where AI-generated decisions require human approval before execution, implemented via Step Functions waitForTaskToken.

HyDE (Hypothetical Document Embeddings) — A query transformation technique that generates a hypothetical answer using an FM and uses that answer's embedding as the search query, bridging query-document vocabulary gaps.

Indirect Prompt Injection — A prompt injection attack where malicious instructions are embedded in retrieved documents rather than user input, causing the FM to execute the embedded instructions.

k-NN (k-Nearest Neighbor) — A search algorithm retrieving the k most similar vectors to a query vector. Exact k-NN guarantees 100% recall but scales poorly; approximate (ANN) algorithms trade recall for speed.

Knowledge Base Sync — The process of ingesting new or updated documents into a Bedrock Knowledge Base. Not real-time — may take minutes to hours for large corpora.

LoRA (Low-Rank Adaptation) — A parameter-efficient fine-tuning technique that adds small trainable rank-decomposition matrices to a frozen base model, dramatically reducing fine-tuning compute cost.

LLM-as-Judge — An evaluation technique using a capable FM to score other FM outputs against defined criteria, scaling human-like quality assessment to large evaluation sets.

MCP (Model Context Protocol) — An open protocol developed by Anthropic that standardizes the interface between AI models and external tools/data sources, enabling interoperable tool integration across agent frameworks.

Model Card — Documentation for an FM deployment describing intended use, training data, evaluation results, known limitations, and responsible AI considerations.

Model Context Protocol — See MCP.

Model Invocation Logging — A Bedrock feature that records every FM request and response to CloudWatch Logs and/or S3, providing the primary forensic audit trail for production GenAI applications.

Multi-Agent Architecture — A system where multiple specialized AI agents collaborate, with a supervisor agent routing tasks to specialized sub-agents. Implemented with AWS Agent Squad.

Provisioned Throughput — A Bedrock pricing model where you purchase reserved model units providing dedicated capacity and consistent latency, cost-effective above ~60–70% utilization.

Prompt Caching — A Bedrock feature that caches the key-value representations of static prompt prefixes (system prompts, documents), eliminating re-processing cost for repeated identical content.

Prompt Injection — An attack where adversarial instructions embedded in user input or retrieved documents override the system prompt, causing the FM to execute unintended behavior.

RAG (Retrieval-Augmented Generation) — An architecture pattern that retrieves relevant documents from a knowledge base and injects them into the FM's context, grounding responses in factual retrieved content rather than model weights alone.

RAGAS — A framework providing four independent metrics for evaluating RAG pipelines: Faithfulness, Answer Relevancy, Context Precision, and Context Recall.

ReAct — A reasoning pattern interleaving Thought (explicit reasoning), Action (tool call), and Observation (tool result) in a loop until the FM has sufficient information to produce a final answer.

Reranker — A model that re-scores initially retrieved chunks based on their specific relevance to the query, improving retrieval precision after the initial approximate k-NN search.

Semantic Caching — An application-layer caching strategy that stores FM responses indexed by query embedding, returning cached responses for semantically similar future queries.

Semantic Chunking — A document splitting strategy that uses an FM to detect topic shifts and splits at semantic boundaries rather than fixed token counts.

Strands Agents — An open-source, Python-first AWS agent framework providing code-level control over agent reasoning, tool integration, and multi-agent orchestration.

Transfer Question — An exam question testing a concept in a novel scenario not covered in the study guide, evaluating whether knowledge generalizes beyond memorized examples.

VPC Endpoint (PrivateLink) — An AWS networking service that routes traffic to AWS services (including Bedrock) through the private AWS network, preventing data from traversing the public internet.

waitForTaskToken — A Step Functions integration pattern that pauses a state machine execution until an external process (human reviewer, approval workflow) sends a SendTaskSuccess or SendTaskFailure callback.

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications