2.1. Data Representation: The Physics of Data
💡 First Principle: Think of data like water—it can exist in different "states" that determine what containers can hold it. Structured data is like ice: rigid, predictable, fits in specific molds. Semi-structured data is like a river: it flows and changes shape but follows a channel. Unstructured data is like vapor: it fills any space but has no inherent form you can grasp directly. The rigidity of your data's schema correlates inversely with your flexibility to change it later. High rigidity (structured) guarantees data integrity and powerful queries; low rigidity (unstructured) offers rapid ingestion but defers organization to later processing.
What happens if you choose wrong? Store structured sales transactions in a blob container, and you'll spend hours writing custom code to answer simple questions like "total revenue by region." Store images in a relational database, and you'll pay premium prices for storage that offers zero benefit—images can't be queried by SQL anyway.
Consider this scenario: You are building an app for a library. You have a list of books (rigid, unchanging attributes like ISBN), user reviews (flexible text), and scanned images of rare manuscripts (binary files). You cannot store these all efficiently in the same way. Each data type demands a storage solution optimized for its structure.
Understanding data representation is the foundation of all database decisions. The way data is organized determines which Azure service is appropriate.
Visual: The Data Representation Spectrum
⚠️ Exam Trap: Don't confuse "unstructured" with "disorganized." Unstructured data (like images or PDFs) is highly organized internally—it's just not organized in a way a database engine can query directly. A JPEG has precise binary structure; the database just can't interpret what's in the image.
Key Trade-Offs:
- Schema Rigidity vs. Ingestion Flexibility: Structured data requires upfront schema design but guarantees data quality. Unstructured data can be ingested immediately but requires processing later.
- Query Power vs. Storage Simplicity: Structured databases support complex queries (JOINs, aggregations). Blob storage is simple and cheap but offers no query capability.