Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

3.3.2. Data and Compute Services

Azure Machine Learning provides specialized services for managing data and computation. Think of these as the infrastructure layer—you need data to train on and compute power to do the training.

Data services:
ServiceWhat It DoesAnalogy
DatastoresConnect to external data sourcesA bookmark to where data lives
DatasetsRegistered, versioned references to dataA labeled box of training data

Datastores explained: Datastores don't copy your data—they connect to where your data already lives:

  • Azure Blob Storage (files, images)
  • Azure Data Lake Storage (large-scale data)
  • Azure SQL Database (structured data)
  • Azure Files (shared file storage)

When you create a datastore, you're setting up the connection credentials once, so jobs can access data without embedding passwords in code.

Datasets explained: Datasets reference specific data through datastores and add useful capabilities:

  • Versioning: Track which data version trained which model
  • Labeling: Store labels alongside data for supervised learning
  • Profiling: Auto-generated statistics about your data
  • Sampling: Work with subsets during development
Compute services:
Compute TypePurposeWhen to Use
Compute instancesDevelopment VMsWriting code, exploring data, testing
Compute clustersScalable trainingTraining models, running experiments
Inference clustersProduction hostingServing predictions to applications
Attached computeExternal resourcesUsing existing VMs or Databricks
Compute instances vs. clusters:
  • Instance: Like your personal laptop in the cloud—always running, for interactive work
  • Cluster: Like a pool of workers—scales up when you submit jobs, scales down when idle

Why compute matters for the exam: Questions may ask about "scaling resources for training" (compute clusters) vs. "hosting models for predictions" (inference endpoints). Development work happens on instances; production work uses clusters or endpoints.

⚠️ Exam Tip: Datastores are CONNECTION POINTS to external storage. Datasets are REFERENCES to specific data you'll use for training. Compute instances are for DEVELOPMENT; compute clusters are for TRAINING at scale.

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications