Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

3.1.1. Regression: Predicting Numbers

Regression predicts continuous numeric values. Think of it as drawing a line through data points to predict where future points will fall. The "regression" name comes from statistics—the model "regresses" toward predicting average behavior.

Key characteristics:
  • Predicts numeric/continuous values
  • Output is a number (price, temperature, probability value)
  • Uses labeled training data (supervised learning)
  • Measures error as distance from prediction (how far off was the guess?)
How regression works:
  1. Training data provides examples: features (inputs) and labels (correct numeric outputs)
  2. The algorithm finds a mathematical function that best fits the data
  3. For new data, the function predicts the numeric output
Common scenarios:
  • Predicting house prices based on square footage, bedrooms, location
  • Forecasting sales numbers for next quarter
  • Estimating delivery times based on distance and traffic
  • Predicting patient blood pressure based on age, weight, exercise

The following table shows regression types:

Regression TypeWhen to UseExample
Linear RegressionOne feature predicts one labelSquare footage → house price
Multiple Linear RegressionMultiple features predict one labelSquare footage + bedrooms + age → house price
Features vs. Labels in Regression:
  • Features (inputs): The data points you know (square footage, number of bedrooms)
  • Label (output): The numeric value you're predicting (price)
  • Training: Model learns the relationship between features and labels
  • Inference: Model predicts labels for new feature combinations

Example: Predicting ice cream sales

  • Features: temperature, day of week, is_holiday
  • Label: number of ice creams sold
  • The model learns: "When temperature rises, sales increase"
Evaluating regression models:
MetricWhat It MeasuresInterpretation
MAE (Mean Absolute Error)Average error magnitudeLower is better
RMSE (Root Mean Square Error)Error with outlier penaltyLower is better
(R-squared)Variance explained0-1, higher is better
When regression fails:
  • Non-linear relationships (try polynomial regression)
  • Missing important features
  • Outliers skewing predictions
  • Correlated features (multicollinearity)
Regression vs. Classification decision:
  • "How much will this cost?" → Regression (numeric)
  • "Will this customer churn?" → Classification (yes/no)
  • "What's the probability of rain?" → Could be either!
    • If you need the probability NUMBER (73%) → Regression
    • If you need the CATEGORY (will rain/won't rain) → Classification

⚠️ Critical Exam Trap: For multiple linear regression, features must be INDEPENDENT of each other. If features are correlated (dependent), predictions will be misleading. For example, if "square footage" and "number of rooms" are highly correlated, the model may give unreliable results.

⚠️ Exam Tip: If the question asks about predicting a NUMBER (price, count, amount, percentage), the answer is regression. If predicting a CATEGORY (yes/no, type, class), it's classification.

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications