Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

3.2.1. Features and Labels

Recall from Section 1.3 that supervised learning needs labeled data. Let's define these terms precisely:

Features are the input attributes used to make predictions—the descriptive characteristics of your data. Think of features as the "questions" on an exam.

Labels are the output values you're trying to predict—the correct answers. Think of labels as the "answer key."

The following table provides examples:

ScenarioFeatures (Inputs)Label (Output)
House price predictionSize, bedrooms, location, agePrice
Email spam detectionWord count, sender, subjectSpam/Not Spam
Diabetes risk predictionAge, body fat percentageRisk probability

Feature engineering: The quality of features dramatically impacts model performance. Feature engineering involves:

  • Selecting which attributes to include
  • Creating new features from existing ones (e.g., "age of house" from "year built")
  • Transforming features (e.g., normalizing values to 0-1 range)
  • Handling missing values appropriately
Good features are:
  • Relevant to the prediction task
  • Available at prediction time (not just in historical data)
  • Not too correlated with each other
  • Reasonably clean and complete

⚠️ Exam Trap: Questions often test whether you can identify features vs labels. Remember: features go IN, labels come OUT. The label is what you're trying to predict.

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications