AWS-MLS-C01 & AWS CERTIFICATION | Supervised Learning Algorithms - AWS Certified Machine Learning

4.1. Supervised Learning Algorithms

First Principle: Supervised learning algorithms fundamentally learn mappings from input features to known target labels, enabling models to make predictions or classifications on new, unseen data.

Supervised learning is a type of machine learning where the model learns from a dataset that includes "labeled" examples (i.e., input features paired with the correct output or target variable). The goal is to learn a mapping from inputs to outputs so that the model can accurately predict outputs for new, unseen inputs.

Key Characteristics of Supervised Learning:

Labeled Data: Requires a dataset where the desired output (target variable) is known for each input.
Training: The model learns patterns and relationships from the labeled training data.
Generalization: The ability of the trained model to make accurate predictions on new, unseen data.
Problem Types:
- Regression: Predicting a continuous numerical value (e.g., house price, temperature).
- Classification: Predicting a discrete category or class (e.g., spam/not-spam, customer churn, image recognition).

Common Supervised Learning Algorithms & AWS Usage:

Linear Regression: Predicts continuous values based on a linear relationship between features and target.
- AWS: SageMaker Linear Learner.
Logistic Regression: Predicts probabilities for binary classification tasks.
- AWS: SageMaker Linear Learner (can be used for classification).
Decision Trees: Tree-like models that make decisions based on feature values.
Random Forest: An ensemble of many decision trees. Reduces overfitting and improves accuracy.
- AWS: SageMaker Random Forest (for classification and regression).
XGBoost (Extreme Gradient Boosting): A highly optimized, powerful gradient boosting framework. Often a top performer in Kaggle competitions.
- AWS: SageMaker XGBoost (for classification and regression).
Support Vector Machines (SVM): Finds the optimal hyperplane that separates data points into classes.
Neural Networks / Deep Learning: For complex patterns, especially with unstructured data (images, text, audio).
- AWS: SageMaker supports TensorFlow, PyTorch, and MXNet containers for deep learning.

Scenario: You need to build two models: one to predict the exact number of product returns a customer will make next month (a continuous value), and another to categorize customer support tickets into predefined types (e.g., "Billing," "Technical," "Feature Request").

Reflection Question: How do supervised learning algorithms (e.g., XGBoost for regression, Random Forest for classification) fundamentally learn mappings from input features to known target labels, enabling models to make predictions or classifications on new, unseen data, distinguishing between continuous and discrete outcomes?