Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.
7. Glossary
This glossary serves as a centralized reference for the technical terminology and AWS-specific jargon used throughout the DEA-C01 syllabus. Mastering these definitions will help you navigate exam questions more quickly and avoid confusion between similar service capabilities.
| Term | Definition | Guide Section |
|---|---|---|
| ABAC | Attribute-Based Access Control — grants permissions based on tags/attributes attached to principals and resources | 5.2.1 |
| ACID | Atomicity, Consistency, Isolation, Durability — transaction properties guaranteeing reliable database operations | 3.2.2 |
| Apache Iceberg | Open table format enabling ACID transactions, time travel, and schema evolution on data lake files | 3.2.2 |
| Aurora | AWS cloud-native relational database compatible with MySQL and PostgreSQL, with distributed storage | 3.1.4 |
| Avro | Row-based data format with embedded schema, commonly used for streaming data serialization | 2.5.1 |
| CDC | Change Data Capture — capturing database mutations as a stream of events for downstream processing | 2.1.4 |
| CDK | Cloud Development Kit — defines CloudFormation resources using programming languages (Python, TypeScript, etc.) | 2.7.2 |
| CloudFormation | AWS IaC service that creates and manages resources from JSON/YAML templates | 2.7.2 |
| CloudTrail | Records all AWS API calls for auditing and compliance | 4.3.2, 5.4.1 |
| CloudTrail Lake | Managed, queryable store for CloudTrail events with built-in SQL interface | 5.4.2 |
| CloudWatch | Unified monitoring service for metrics, logs, and alarms across AWS services | 4.3.1 |
| CMK | Customer Managed Key — a KMS key you create and control, with configurable key policies | 5.3.1 |
| COPY | Redshift command to load data from S3 into Redshift tables | 3.1.2 |
| CSE | Client-Side Encryption — data encrypted before upload to AWS | 5.3.1 |
| DAG | Directed Acyclic Graph — workflow model used by Apache Airflow to define task dependencies | 2.6.2 |
| DMS | Database Migration Service — migrates and replicates databases with full load and CDC support | 2.1.4, 2.2.3 |
| DPU | Data Processing Unit — Glue's unit of compute capacity (4 vCPUs, 16 GB memory) | 2.4.1 |
| DynamicFrame | Glue's native data structure extending Spark DataFrames with schema flexibility | 2.4.1 |
| DynamoDB Streams | Captures item-level changes in DynamoDB tables as an ordered sequence of events | 2.1.4 |
| ELT | Extract, Load, Transform — loads raw data first, transforms in the target system | 1.1.2 |
| EMR | Elastic MapReduce — managed Hadoop/Spark framework for large-scale data processing | 2.4.2 |
| Enhanced fan-out | Kinesis feature giving each consumer a dedicated 2 MB/s throughput per shard | 2.1.1 |
| ETL | Extract, Transform, Load — transforms data before loading into the target system | 1.1.2 |
| EventBridge | Serverless event bus for routing events between AWS services and custom applications | 2.3.1 |
| Federated query | Redshift feature querying data in external databases (RDS, Aurora) without copying | 3.1.2 |
| Firehose | Kinesis Data Firehose — near-real-time delivery of streaming data to S3, Redshift, OpenSearch | 2.1.2 |
| Glue crawler | Automatically discovers and catalogs data schema from S3 or JDBC sources | 2.2.2, 3.3.1 |
| Glue Data Catalog | Central metadata repository for data lake tables, schemas, and partitions | 3.3.1 |
| Glue Data Quality | Rule-based data validation integrated into Glue ETL pipelines using DQDL | 4.4.1 |
| GSI | Global Secondary Index — DynamoDB index with a different partition key and sort key | 3.1.3 |
| HNSW | Hierarchical Navigable Small World — vector index type optimizing search accuracy and speed | 3.2.3 |
| IAM | Identity and Access Management — AWS service for authentication and authorization | 5.1.1 |
| IVF | Inverted File Index — vector index type partitioning vectors into clusters for memory-efficient search | 3.2.3 |
| Job bookmark | Glue feature tracking processed data for incremental ETL processing | 2.4.1 |
| KMS | Key Management Service — centralized encryption key management for AWS services | 5.3.1 |
| Lake Formation | Centralized data lake permission management with column-level and row-level security | 5.2.2 |
| LF-Tag | Lake Formation Tag — metadata tags for tag-based access control on data lake resources | 5.2.2 |
| LSI | Local Secondary Index — DynamoDB index with the same partition key but different sort key | 3.1.3 |
| Macie | ML-powered service that discovers and classifies sensitive data (PII) in S3 | 5.5.1 |
| Materialized view | Precomputed query result stored in Redshift, refreshable on demand or incrementally | 3.1.2 |
| MPP | Massively Parallel Processing — distributing query execution across multiple nodes | 3.1.2 |
| MSK | Managed Streaming for Apache Kafka — fully managed Kafka service on AWS | 2.1.3 |
| MSCK REPAIR TABLE | Athena/Hive command that syncs S3 partitions with the Glue Data Catalog | 3.3.1 |
| MWAA | Managed Workflows for Apache Airflow — managed Airflow orchestration service | 2.6.2 |
| ORC | Optimized Row Columnar — columnar file format associated with the Hive ecosystem | 2.5.1 |
| Parquet | Columnar storage format optimized for analytics, supporting compression and predicate pushdown | 2.5.1 |
| Partition projection | Athena feature calculating partitions at query time instead of querying the Glue Catalog | 2.7.1, 4.2.1 |
| PrivateLink | AWS technology for private connectivity between VPCs and AWS services | 5.1.2 |
| RBAC | Role-Based Access Control — assigning permissions to roles that users assume | 5.2.1 |
| Redshift Serverless | Auto-scaling Redshift deployment requiring no cluster management | 3.1.2 |
| Redshift Spectrum | Queries S3 data directly from Redshift using external tables | 3.1.2 |
| RPU | Redshift Processing Unit — capacity unit for Redshift Serverless | 3.1.2 |
| S3 Tables | Managed Apache Iceberg tables natively integrated with S3 | 3.2.2 |
| SAM | Serverless Application Model — CloudFormation extension for serverless applications | 2.7.3 |
| SCP | Service Control Policy — organization-wide policy restricting AWS actions | 5.5.1 |
| Secrets Manager | Stores and automatically rotates database credentials and API keys | 5.1.2 |
| SPICE | Super-fast, Parallel, In-memory Calculation Engine — QuickSight's in-memory cache | 4.2.2 |
| SSE-KMS | Server-Side Encryption with KMS-managed keys, providing CloudTrail auditability | 5.3.1 |
| SSE-S3 | Server-Side Encryption with Amazon S3-managed keys (default) | 5.3.1 |
| Step Functions | AWS serverless workflow service using state machines for multi-service orchestration | 2.6.1 |
| TTL | Time to Live — DynamoDB feature for automatic item expiration | 3.4.2 |
| UNLOAD | Redshift command to export query results to S3 | 3.1.2 |
| Vector embedding | Numerical representation of data enabling similarity search in vector databases | 3.2.3 |
| VPC endpoint | Private connection between a VPC and AWS services without internet traversal | 5.1.2 |
| WORM | Write Once Read Many — data protection model enforced by S3 Object Lock | 3.4.2 |
Written byAlvin Varughese
Founder•15 professional certifications