Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.
3.2.1. Features and Labels
Recall from Section 1.3 that supervised learning needs labeled data. Let's define these terms precisely:
Features are the input attributes used to make predictions—the descriptive characteristics of your data. Think of features as the "questions" on an exam.
Labels are the output values you're trying to predict—the correct answers. Think of labels as the "answer key."
The following table provides examples:
| Scenario | Features (Inputs) | Label (Output) |
|---|---|---|
| House price prediction | Size, bedrooms, location, age | Price |
| Email spam detection | Word count, sender, subject | Spam/Not Spam |
| Diabetes risk prediction | Age, body fat percentage | Risk probability |
Feature engineering: The quality of features dramatically impacts model performance. Feature engineering involves:
- Selecting which attributes to include
- Creating new features from existing ones (e.g., "age of house" from "year built")
- Transforming features (e.g., normalizing values to 0-1 range)
- Handling missing values appropriately
Good features are:
- Relevant to the prediction task
- Available at prediction time (not just in historical data)
- Not too correlated with each other
- Reasonably clean and complete
⚠️ Exam Trap: Questions often test whether you can identify features vs labels. Remember: features go IN, labels come OUT. The label is what you're trying to predict.
Written byAlvin Varughese
Founder•15 professional certifications