Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

7. Glossary

This glossary serves as a centralized reference for the technical terminology and AWS-specific jargon used throughout the DEA-C01 syllabus. Mastering these definitions will help you navigate exam questions more quickly and avoid confusion between similar service capabilities.

TermDefinitionGuide Section
ABACAttribute-Based Access Control — grants permissions based on tags/attributes attached to principals and resources5.2.1
ACIDAtomicity, Consistency, Isolation, Durability — transaction properties guaranteeing reliable database operations3.2.2
Apache IcebergOpen table format enabling ACID transactions, time travel, and schema evolution on data lake files3.2.2
AuroraAWS cloud-native relational database compatible with MySQL and PostgreSQL, with distributed storage3.1.4
AvroRow-based data format with embedded schema, commonly used for streaming data serialization2.5.1
CDCChange Data Capture — capturing database mutations as a stream of events for downstream processing2.1.4
CDKCloud Development Kit — defines CloudFormation resources using programming languages (Python, TypeScript, etc.)2.7.2
CloudFormationAWS IaC service that creates and manages resources from JSON/YAML templates2.7.2
CloudTrailRecords all AWS API calls for auditing and compliance4.3.2, 5.4.1
CloudTrail LakeManaged, queryable store for CloudTrail events with built-in SQL interface5.4.2
CloudWatchUnified monitoring service for metrics, logs, and alarms across AWS services4.3.1
CMKCustomer Managed Key — a KMS key you create and control, with configurable key policies5.3.1
COPYRedshift command to load data from S3 into Redshift tables3.1.2
CSEClient-Side Encryption — data encrypted before upload to AWS5.3.1
DAGDirected Acyclic Graph — workflow model used by Apache Airflow to define task dependencies2.6.2
DMSDatabase Migration Service — migrates and replicates databases with full load and CDC support2.1.4, 2.2.3
DPUData Processing Unit — Glue's unit of compute capacity (4 vCPUs, 16 GB memory)2.4.1
DynamicFrameGlue's native data structure extending Spark DataFrames with schema flexibility2.4.1
DynamoDB StreamsCaptures item-level changes in DynamoDB tables as an ordered sequence of events2.1.4
ELTExtract, Load, Transform — loads raw data first, transforms in the target system1.1.2
EMRElastic MapReduce — managed Hadoop/Spark framework for large-scale data processing2.4.2
Enhanced fan-outKinesis feature giving each consumer a dedicated 2 MB/s throughput per shard2.1.1
ETLExtract, Transform, Load — transforms data before loading into the target system1.1.2
EventBridgeServerless event bus for routing events between AWS services and custom applications2.3.1
Federated queryRedshift feature querying data in external databases (RDS, Aurora) without copying3.1.2
FirehoseKinesis Data Firehose — near-real-time delivery of streaming data to S3, Redshift, OpenSearch2.1.2
Glue crawlerAutomatically discovers and catalogs data schema from S3 or JDBC sources2.2.2, 3.3.1
Glue Data CatalogCentral metadata repository for data lake tables, schemas, and partitions3.3.1
Glue Data QualityRule-based data validation integrated into Glue ETL pipelines using DQDL4.4.1
GSIGlobal Secondary Index — DynamoDB index with a different partition key and sort key3.1.3
HNSWHierarchical Navigable Small World — vector index type optimizing search accuracy and speed3.2.3
IAMIdentity and Access Management — AWS service for authentication and authorization5.1.1
IVFInverted File Index — vector index type partitioning vectors into clusters for memory-efficient search3.2.3
Job bookmarkGlue feature tracking processed data for incremental ETL processing2.4.1
KMSKey Management Service — centralized encryption key management for AWS services5.3.1
Lake FormationCentralized data lake permission management with column-level and row-level security5.2.2
LF-TagLake Formation Tag — metadata tags for tag-based access control on data lake resources5.2.2
LSILocal Secondary Index — DynamoDB index with the same partition key but different sort key3.1.3
MacieML-powered service that discovers and classifies sensitive data (PII) in S35.5.1
Materialized viewPrecomputed query result stored in Redshift, refreshable on demand or incrementally3.1.2
MPPMassively Parallel Processing — distributing query execution across multiple nodes3.1.2
MSKManaged Streaming for Apache Kafka — fully managed Kafka service on AWS2.1.3
MSCK REPAIR TABLEAthena/Hive command that syncs S3 partitions with the Glue Data Catalog3.3.1
MWAAManaged Workflows for Apache Airflow — managed Airflow orchestration service2.6.2
ORCOptimized Row Columnar — columnar file format associated with the Hive ecosystem2.5.1
ParquetColumnar storage format optimized for analytics, supporting compression and predicate pushdown2.5.1
Partition projectionAthena feature calculating partitions at query time instead of querying the Glue Catalog2.7.1, 4.2.1
PrivateLinkAWS technology for private connectivity between VPCs and AWS services5.1.2
RBACRole-Based Access Control — assigning permissions to roles that users assume5.2.1
Redshift ServerlessAuto-scaling Redshift deployment requiring no cluster management3.1.2
Redshift SpectrumQueries S3 data directly from Redshift using external tables3.1.2
RPURedshift Processing Unit — capacity unit for Redshift Serverless3.1.2
S3 TablesManaged Apache Iceberg tables natively integrated with S33.2.2
SAMServerless Application Model — CloudFormation extension for serverless applications2.7.3
SCPService Control Policy — organization-wide policy restricting AWS actions5.5.1
Secrets ManagerStores and automatically rotates database credentials and API keys5.1.2
SPICESuper-fast, Parallel, In-memory Calculation Engine — QuickSight's in-memory cache4.2.2
SSE-KMSServer-Side Encryption with KMS-managed keys, providing CloudTrail auditability5.3.1
SSE-S3Server-Side Encryption with Amazon S3-managed keys (default)5.3.1
Step FunctionsAWS serverless workflow service using state machines for multi-service orchestration2.6.1
TTLTime to Live — DynamoDB feature for automatic item expiration3.4.2
UNLOADRedshift command to export query results to S33.1.2
Vector embeddingNumerical representation of data enabling similarity search in vector databases3.2.3
VPC endpointPrivate connection between a VPC and AWS services without internet traversal5.1.2
WORMWrite Once Read Many — data protection model enforced by S3 Object Lock3.4.2
Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications