3.1.2. Classification: Predicting Categories
Classification predicts which category something belongs to. Unlike regression's continuous output, classification produces discrete category labels. Think of it like sorting mail into bins—each item goes into exactly one bin (category).
Key characteristics:
- Predicts categories/classes (not numbers)
- Output is a label (yes/no, spam/not spam, cat/dog)
- Uses labeled training data (supervised learning)
- Outputs include confidence score for each category
- Model learns decision boundaries between categories
How classification works:
- Training data provides examples with features AND category labels
- The algorithm learns patterns that distinguish categories
- For new data, the model predicts which category it belongs to
- Returns predicted category + confidence percentage
Example: Email spam detection
- Features: sender, subject keywords, presence of links, time sent
- Labels: Spam / Not Spam
- The model learns: "Emails with 'FREE MONEY' in subject → Spam"
The following table shows classification types:
| Classification Type | Output | Example | Use Case |
|---|---|---|---|
| Binary Classification | Two classes | Spam/Not Spam | Fraud detection, medical diagnosis |
| Multiclass Classification | Multiple classes | Cat/Dog/Bird/Fish | Image recognition, document categorization |
| Multilabel Classification | Multiple labels possible | Action + Comedy + Drama | Movie genre tagging |
Binary vs. Multiclass:
- Binary: Exactly two possible outcomes (yes/no, true/false, positive/negative)
- Multiclass: Three or more categories, but each item gets exactly ONE label
- Multilabel: Items can have MULTIPLE labels simultaneously
Common classification scenarios:
- Medical diagnosis (disease present / absent)
- Email filtering (spam / not spam)
- Customer churn prediction (will leave / will stay)
- Sentiment analysis (positive / negative / neutral)
- Image recognition (cat / dog / bird)
- Fraud detection (fraudulent / legitimate)
⚠️ CRITICAL Exam Trap: Logistic Regression is a CLASSIFICATION algorithm, NOT regression! Despite its misleading name, logistic regression predicts categories (typically yes/no), not continuous numbers. This is one of the most frequently tested traps.
⚠️ Exam Tip: When a question asks about predicting "yes/no," "true/false," "positive/negative," or any discrete categories—that's classification. Even if probabilities are involved (like "70% chance of spam"), the final output is still a category.