7. Comprehensive Glossary
A
ACID: Atomicity, Consistency, Isolation, Durability—properties guaranteeing reliable database transactions.
Always Encrypted: SQL Server feature encrypting data so that keys never leave the client; DBAs cannot see plaintext.
Apache Spark: Open-source distributed computing framework for big data processing; available in Synapse and Databricks.
Archive Tier: Lowest-cost Blob Storage tier for rarely accessed data; retrieval takes hours.
Atomicity: ACID property ensuring all transaction operations succeed or all fail together.
Avro: Binary row-based file format optimized for write-heavy streaming workloads.
Azure Cosmos DB: Globally distributed, multi-model NoSQL database with guaranteed low latency.
Azure Data Factory (ADF): Cloud ETL/ELT orchestration service for data integration pipelines.
Azure Data Lake Storage Gen2: Blob Storage with hierarchical namespace optimized for big data analytics.
Azure Databricks: Managed Spark platform optimized for data engineering and machine learning.
Azure Files: Fully managed file shares accessible via SMB and NFS protocols.
Azure Monitor: Platform service for collecting metrics, logs, and alerts from Azure resources.
Azure SQL Database: Fully managed PaaS relational database based on SQL Server engine.
Azure SQL Managed Instance: PaaS SQL Server with near 100% compatibility including SQL Agent and CLR.
Azure Stream Analytics: Real-time stream processing service using SQL-like query language.
Azure Synapse Analytics: Unified analytics service combining data warehousing, big data, and data integration.
B
Batch Processing: Processing data in scheduled chunks; optimized for throughput over latency.
Blob Storage: Azure object storage for unstructured data (images, videos, backups).
Block Blob: Blob type optimized for storing discrete files; most common blob type.
Bounded Staleness: Cosmos DB consistency level allowing reads to lag by defined time/versions.
Business Critical Tier: Azure SQL tier with high performance and built-in local SSD HA.
C
Card: Power BI visual displaying a single KPI value.
Cassandra API: Cosmos DB API compatible with Apache Cassandra wide-column model.
Cold Tier: Blob Storage tier for very infrequently accessed data; 90-day minimum retention.
Columnar Storage: File format storing data by column; optimized for analytics (Parquet, ORC).
Consistency: ACID property ensuring database moves from one valid state to another.
Consistent Prefix: Cosmos DB consistency level ensuring no out-of-order reads.
Cool Tier: Blob Storage tier for infrequently accessed data; 30-day minimum retention.
Cosmos DB: See Azure Cosmos DB.
CSV: Comma-Separated Values—simple text format for tabular data.
D
Dashboard: Single-page Power BI summary combining tiles from multiple reports.
Data Analyst: Role focused on insights, visualization, and business intelligence.
Data Engineer: Role focused on building data pipelines and integration.
Data Lake: Storage repository holding vast amounts of raw data in native format.
Data Lineage: Visual tracking of data flow from source through transformations to destination.
Data Scientist: Role focused on machine learning and advanced analytics.
DBA: Database Administrator—role focused on database health, security, and availability.
DCL: Data Control Language—SQL commands for access control (GRANT, REVOKE).
DDL: Data Definition Language—SQL commands for structure (CREATE, ALTER, DROP).
Dedicated SQL Pool: Provisioned Synapse data warehouse; pay per DWU.
Delta Lake: Open-source storage layer adding ACID transactions to data lakes.
Dimension: Descriptive table in star schema; used to filter and group facts.
DML: Data Manipulation Language—SQL commands for data (SELECT, INSERT, UPDATE, DELETE).
DTU: Database Transaction Unit—bundled measure of Azure SQL compute resources.
Durability: ACID property ensuring committed data survives system failures.
Dynamic Data Masking: Feature hiding sensitive data in query results without changing stored data.
E
Elastic Pool: Azure SQL feature allowing multiple databases to share resources.
ELT: Extract, Load, Transform—modern pattern loading raw data before transforming.
ETL: Extract, Transform, Load—traditional pattern transforming data before loading.
Eventual Consistency: Cosmos DB's lowest latency consistency level; no ordering guarantees.
Event Hubs: Azure service for ingesting millions of events per second (streaming ingestor).
F
Fact Table: Central table in star schema containing measurable business events.
Foreign Key: Column referencing a primary key in another table to create relationships.
G
General Purpose Tier: Azure SQL tier balancing compute and storage for most workloads.
Gremlin API: Cosmos DB API for graph databases (nodes and edges).
H
Hierarchical Namespace: Data Lake Gen2 feature enabling efficient directory operations and ACLs.
Hot Tier: Blob Storage tier for frequently accessed data; highest storage cost, lowest access cost.
Hyperscale Tier: Azure SQL tier supporting up to 100 TB with rapid scale-out.
I
IaaS: Infrastructure as a Service—you manage OS and above (e.g., SQL on VM).
Index: Database structure that speeds up data retrieval; like a book index.
Isolation: ACID property ensuring concurrent transactions don't interfere.
J
JSON: JavaScript Object Notation—human-readable format for semi-structured data.
K
KQL: Kusto Query Language—query language for Azure Data Explorer and Fabric Real-Time Analytics.
L
Lakehouse: Architecture combining data lake flexibility with warehouse reliability.
Latency: Time delay between request and response; critical for real-time systems.
M
Managed Instance: Azure SQL PaaS with near 100% SQL Server compatibility.
Microsoft Fabric: Unified SaaS analytics platform combining all analytics workloads.
Microsoft Purview: Unified data governance service for cataloging, lineage, and classification.
MongoDB API: Cosmos DB API compatible with MongoDB document model.
MPP: Massively Parallel Processing—architecture distributing queries across nodes.
N
NFS: Network File System—protocol for Linux file shares (supported by Azure Files).
Normalization: Database design process reducing redundancy by separating data into related tables.
NoSQL: Non-relational databases optimized for specific data models and scale.
O
OLAP: Online Analytical Processing—workloads optimized for complex queries on historical data.
OLTP: Online Transaction Processing—workloads optimized for many small, fast transactions.
OneLake: Unified logical data lake in Microsoft Fabric; single namespace for all data.
ORC: Optimized Row Columnar—binary columnar format for Hadoop ecosystems.
P
PaaS: Platform as a Service—Azure manages infrastructure; you manage data/config.
Parquet: Binary columnar file format optimized for analytics queries.
Partition Key: Cosmos DB key determining data distribution across physical partitions.
Power BI: Microsoft business intelligence platform for creating reports and dashboards.
Power BI Desktop: Windows application for authoring Power BI reports.
Power BI Service: Cloud service for publishing, sharing, and collaborating on reports.
Power Query: Data transformation engine in Power BI for cleaning and shaping data.
Primary Key: Column(s) uniquely identifying each row in a table.
R
Referential Integrity: Constraint ensuring foreign keys reference valid primary keys.
Report: Multi-page Power BI document for detailed analysis.
Request Unit (RU): Cosmos DB measure of operation cost; 1 RU = 1 KB point read.
Row-based Storage: File format storing complete records together (CSV, Avro); optimized for writes.
S
SaaS: Software as a Service—fully managed cloud application (e.g., Microsoft Fabric).
Schema: Structure defining tables, columns, data types, and relationships.
Schema-on-Read: Data lakes validate schema at query time, not write time.
Schema-on-Write: Databases validate schema when data is written.
Semi-Structured Data: Data with internal tags/keys but flexible schema (JSON, XML).
Serverless SQL Pool: Synapse component querying data lake files without provisioning; pay per TB scanned.
Session Consistency: Cosmos DB default; user sees their own writes in order.
Shortcut: OneLake pointer to external data without copying.
SMB: Server Message Block—Windows file sharing protocol (Azure Files).
Spark Pool: Managed Apache Spark cluster in Synapse for big data processing.
SQL Agent: SQL Server job scheduler; available in Managed Instance, not SQL Database.
Star Schema: Data warehouse design with central fact table surrounded by dimension tables.
Stream Processing: Processing data event-by-event as it arrives; optimized for latency.
Strong Consistency: Cosmos DB level guaranteeing reads return most recent committed write.
Structured Data: Data with fixed schema organized in tables (rows and columns).
Synapse Pipelines: Data Factory integrated within Synapse workspace.
T
Table API: Cosmos DB API for simple key-value data; compatible with Azure Table Storage.
Table Storage: Simple NoSQL key-value store in Azure Storage Accounts.
TDE: Transparent Data Encryption—encrypts database files at rest automatically.
Throughput: Volume of data processed per unit time; batch processing priority.
U
Unstructured Data: Binary data with no schema (images, video, audio, PDFs).
Update Anomaly: Problem where duplicate data requires multiple updates; solved by normalization.
V
vCore: Virtual Core—Azure SQL purchasing model with independent compute/storage control.
View: Virtual table based on SQL query result; doesn't store data.
W
Wide-Column Store: NoSQL model where columns can vary per row (Cassandra).
X
XML: Extensible Markup Language—tag-based semi-structured format.