6.4.1. Core Data Concepts Questions
Question 1
A retail company stores customer reviews submitted through a mobile app. Each review contains a star rating, comment text, timestamp, and optional product photos. The review structure varies—some include location data, others do not. Which data type best describes this data?
- A. Structured data
- B. Semi-structured data
- C. Unstructured data
- D. Relational data
Answer: B. Semi-structured data
Explanation: The reviews have internal structure (fields like rating, comment, timestamp) but vary between records (optional location, photos). This self-describing, flexible schema is characteristic of semi-structured data (JSON). The photos themselves are unstructured, but the review as a whole is semi-structured.
Question 2
A data engineer needs to store 500 million rows of IoT sensor readings for analytics queries. Analysts typically query only 3-4 columns (device_id, timestamp, temperature) out of 25 total columns. Which file format should the engineer use?
- A. CSV
- B. JSON
- C. Parquet
- D. Avro
Answer: C. Parquet
Explanation: Parquet is a columnar format optimized for analytics. When analysts query specific columns, Parquet only reads those columns from disk—not all 25. This provides 10-100x performance improvement over row-based formats like CSV or JSON for this use case.
Question 3
A bank needs to process ATM withdrawals ensuring that if the withdrawal succeeds, the customer balance is updated atomically. If any step fails, the entire operation must roll back. Which property of database transactions ensures this behavior?
- A. Consistency
- B. Isolation
- C. Durability
- D. Atomicity
Answer: D. Atomicity
Explanation: Atomicity ensures "all or nothing"—either all operations in a transaction succeed, or they all fail and roll back. This prevents partial updates that would leave data in an inconsistent state.
Question 4
Your company has two data needs: (1) Processing thousands of customer orders per minute with guaranteed accuracy, and (2) Running monthly sales trend analysis across 100 million historical records. Which workload types apply?
- A. Both are OLTP workloads
- B. Both are OLAP workloads
- C. Orders = OLTP, Analysis = OLAP
- D. Orders = OLAP, Analysis = OLTP
Answer: C. Orders = OLTP, Analysis = OLAP
Explanation: Order processing requires fast, accurate transactions with ACID compliance (OLTP). Monthly analysis requires aggregating millions of historical rows (OLAP). These are fundamentally different workloads requiring different database architectures.
Question 5
A streaming service needs to detect fraudulent credit card transactions within milliseconds of occurrence. Which processing pattern is required?
- A. Batch processing with Azure Data Factory
- B. Stream processing with Azure Stream Analytics
- C. ETL processing with Azure Synapse
- D. OLAP processing with Power BI
Answer: B. Stream processing with Azure Stream Analytics
Explanation: Fraud detection requires real-time, event-by-event analysis with millisecond latency. Stream processing analyzes each transaction as it arrives. Batch processing would introduce unacceptable delays—the fraud would complete before detection.