Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

7. Comprehensive Glossary

A

ACID: Atomicity, Consistency, Isolation, Durability—properties guaranteeing reliable database transactions.

Always Encrypted: SQL Server feature encrypting data so that keys never leave the client; DBAs cannot see plaintext.

Apache Spark: Open-source distributed computing framework for big data processing; available in Synapse and Databricks.

Archive Tier: Lowest-cost Blob Storage tier for rarely accessed data; retrieval takes hours.

Atomicity: ACID property ensuring all transaction operations succeed or all fail together.

Avro: Binary row-based file format optimized for write-heavy streaming workloads.

Azure Cosmos DB: Globally distributed, multi-model NoSQL database with guaranteed low latency.

Azure Data Factory (ADF): Cloud ETL/ELT orchestration service for data integration pipelines.

Azure Data Lake Storage Gen2: Blob Storage with hierarchical namespace optimized for big data analytics.

Azure Databricks: Managed Spark platform optimized for data engineering and machine learning.

Azure Files: Fully managed file shares accessible via SMB and NFS protocols.

Azure Monitor: Platform service for collecting metrics, logs, and alerts from Azure resources.

Azure SQL Database: Fully managed PaaS relational database based on SQL Server engine.

Azure SQL Managed Instance: PaaS SQL Server with near 100% compatibility including SQL Agent and CLR.

Azure Stream Analytics: Real-time stream processing service using SQL-like query language.

Azure Synapse Analytics: Unified analytics service combining data warehousing, big data, and data integration.

B

Batch Processing: Processing data in scheduled chunks; optimized for throughput over latency.

Blob Storage: Azure object storage for unstructured data (images, videos, backups).

Block Blob: Blob type optimized for storing discrete files; most common blob type.

Bounded Staleness: Cosmos DB consistency level allowing reads to lag by defined time/versions.

Business Critical Tier: Azure SQL tier with high performance and built-in local SSD HA.

C

Card: Power BI visual displaying a single KPI value.

Cassandra API: Cosmos DB API compatible with Apache Cassandra wide-column model.

Cold Tier: Blob Storage tier for very infrequently accessed data; 90-day minimum retention.

Columnar Storage: File format storing data by column; optimized for analytics (Parquet, ORC).

Consistency: ACID property ensuring database moves from one valid state to another.

Consistent Prefix: Cosmos DB consistency level ensuring no out-of-order reads.

Cool Tier: Blob Storage tier for infrequently accessed data; 30-day minimum retention.

Cosmos DB: See Azure Cosmos DB.

CSV: Comma-Separated Values—simple text format for tabular data.

D

Dashboard: Single-page Power BI summary combining tiles from multiple reports.

Data Analyst: Role focused on insights, visualization, and business intelligence.

Data Engineer: Role focused on building data pipelines and integration.

Data Lake: Storage repository holding vast amounts of raw data in native format.

Data Lineage: Visual tracking of data flow from source through transformations to destination.

Data Scientist: Role focused on machine learning and advanced analytics.

DBA: Database Administrator—role focused on database health, security, and availability.

DCL: Data Control Language—SQL commands for access control (GRANT, REVOKE).

DDL: Data Definition Language—SQL commands for structure (CREATE, ALTER, DROP).

Dedicated SQL Pool: Provisioned Synapse data warehouse; pay per DWU.

Delta Lake: Open-source storage layer adding ACID transactions to data lakes.

Dimension: Descriptive table in star schema; used to filter and group facts.

DML: Data Manipulation Language—SQL commands for data (SELECT, INSERT, UPDATE, DELETE).

DTU: Database Transaction Unit—bundled measure of Azure SQL compute resources.

Durability: ACID property ensuring committed data survives system failures.

Dynamic Data Masking: Feature hiding sensitive data in query results without changing stored data.

E

Elastic Pool: Azure SQL feature allowing multiple databases to share resources.

ELT: Extract, Load, Transform—modern pattern loading raw data before transforming.

ETL: Extract, Transform, Load—traditional pattern transforming data before loading.

Eventual Consistency: Cosmos DB's lowest latency consistency level; no ordering guarantees.

Event Hubs: Azure service for ingesting millions of events per second (streaming ingestor).

F

Fact Table: Central table in star schema containing measurable business events.

Foreign Key: Column referencing a primary key in another table to create relationships.

G

General Purpose Tier: Azure SQL tier balancing compute and storage for most workloads.

Gremlin API: Cosmos DB API for graph databases (nodes and edges).

H

Hierarchical Namespace: Data Lake Gen2 feature enabling efficient directory operations and ACLs.

Hot Tier: Blob Storage tier for frequently accessed data; highest storage cost, lowest access cost.

Hyperscale Tier: Azure SQL tier supporting up to 100 TB with rapid scale-out.

I

IaaS: Infrastructure as a Service—you manage OS and above (e.g., SQL on VM).

Index: Database structure that speeds up data retrieval; like a book index.

Isolation: ACID property ensuring concurrent transactions don't interfere.

J

JSON: JavaScript Object Notation—human-readable format for semi-structured data.

K

KQL: Kusto Query Language—query language for Azure Data Explorer and Fabric Real-Time Analytics.

L

Lakehouse: Architecture combining data lake flexibility with warehouse reliability.

Latency: Time delay between request and response; critical for real-time systems.

M

Managed Instance: Azure SQL PaaS with near 100% SQL Server compatibility.

Microsoft Fabric: Unified SaaS analytics platform combining all analytics workloads.

Microsoft Purview: Unified data governance service for cataloging, lineage, and classification.

MongoDB API: Cosmos DB API compatible with MongoDB document model.

MPP: Massively Parallel Processing—architecture distributing queries across nodes.

N

NFS: Network File System—protocol for Linux file shares (supported by Azure Files).

Normalization: Database design process reducing redundancy by separating data into related tables.

NoSQL: Non-relational databases optimized for specific data models and scale.

O

OLAP: Online Analytical Processing—workloads optimized for complex queries on historical data.

OLTP: Online Transaction Processing—workloads optimized for many small, fast transactions.

OneLake: Unified logical data lake in Microsoft Fabric; single namespace for all data.

ORC: Optimized Row Columnar—binary columnar format for Hadoop ecosystems.

P

PaaS: Platform as a Service—Azure manages infrastructure; you manage data/config.

Parquet: Binary columnar file format optimized for analytics queries.

Partition Key: Cosmos DB key determining data distribution across physical partitions.

Power BI: Microsoft business intelligence platform for creating reports and dashboards.

Power BI Desktop: Windows application for authoring Power BI reports.

Power BI Service: Cloud service for publishing, sharing, and collaborating on reports.

Power Query: Data transformation engine in Power BI for cleaning and shaping data.

Primary Key: Column(s) uniquely identifying each row in a table.

R

Referential Integrity: Constraint ensuring foreign keys reference valid primary keys.

Report: Multi-page Power BI document for detailed analysis.

Request Unit (RU): Cosmos DB measure of operation cost; 1 RU = 1 KB point read.

Row-based Storage: File format storing complete records together (CSV, Avro); optimized for writes.

S

SaaS: Software as a Service—fully managed cloud application (e.g., Microsoft Fabric).

Schema: Structure defining tables, columns, data types, and relationships.

Schema-on-Read: Data lakes validate schema at query time, not write time.

Schema-on-Write: Databases validate schema when data is written.

Semi-Structured Data: Data with internal tags/keys but flexible schema (JSON, XML).

Serverless SQL Pool: Synapse component querying data lake files without provisioning; pay per TB scanned.

Session Consistency: Cosmos DB default; user sees their own writes in order.

Shortcut: OneLake pointer to external data without copying.

SMB: Server Message Block—Windows file sharing protocol (Azure Files).

Spark Pool: Managed Apache Spark cluster in Synapse for big data processing.

SQL Agent: SQL Server job scheduler; available in Managed Instance, not SQL Database.

Star Schema: Data warehouse design with central fact table surrounded by dimension tables.

Stream Processing: Processing data event-by-event as it arrives; optimized for latency.

Strong Consistency: Cosmos DB level guaranteeing reads return most recent committed write.

Structured Data: Data with fixed schema organized in tables (rows and columns).

Synapse Pipelines: Data Factory integrated within Synapse workspace.

T

Table API: Cosmos DB API for simple key-value data; compatible with Azure Table Storage.

Table Storage: Simple NoSQL key-value store in Azure Storage Accounts.

TDE: Transparent Data Encryption—encrypts database files at rest automatically.

Throughput: Volume of data processed per unit time; batch processing priority.

U

Unstructured Data: Binary data with no schema (images, video, audio, PDFs).

Update Anomaly: Problem where duplicate data requires multiple updates; solved by normalization.

V

vCore: Virtual Core—Azure SQL purchasing model with independent compute/storage control.

View: Virtual table based on SQL query result; doesn't store data.

W

Wide-Column Store: NoSQL model where columns can vary per row (Cassandra).

X

XML: Extensible Markup Language—tag-based semi-structured format.

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications