3.1.4. Deep Learning and Neural Networks
💡 First Principle: Deep learning uses neural networks with many layers to automatically discover the features needed for complex pattern recognition. While traditional ML requires humans to specify what features matter, deep learning discovers them from raw data—which is why it excels at images, speech, and text where the relevant features aren't obvious.
What breaks without understanding this: The exam asks about when deep learning is appropriate versus traditional ML. Without understanding that deep learning's power lies in automatic feature discovery, you might think it's always better (it's not—it needs massive data). Questions like "Why use deep learning for image recognition instead of traditional classification?" have a specific answer: because humans can't manually define what makes a face a face, but neural networks can discover those features automatically.
Deep learning builds on the AI hierarchy from Section 1.1.2—it's a subset of machine learning that powers nearly everything modern AI does.
How neural networks learn:
Think of a neural network like a factory assembly line where each station (layer) refines the product:
- Input layer: Receives raw data (pixels, audio samples, text tokens)
- Hidden layers: Each layer learns increasingly abstract patterns
- Layer 1 might detect edges in an image
- Layer 2 combines edges into shapes
- Layer 3 combines shapes into objects
- Output layer: Produces the final prediction
What makes it "deep": More hidden layers = deeper network = ability to learn more complex patterns. A network with 100+ layers can recognize faces; one with 3 layers cannot.
Key characteristics:
- Uses artificial neural networks inspired by (but not identical to) biological brains
- "Deep" means many hidden layers (dozens to hundreds)
- Excels at complex patterns: images, speech, text, games
- Requires large amounts of training data (thousands to millions of examples)
- Computationally expensive—needs GPUs for practical training
Common applications:
| Application | What the Network Learns |
|---|---|
| Image recognition | Edges → Shapes → Objects → "This is a cat" |
| Speech recognition | Sound waves → Phonemes → Words → Sentences |
| Language understanding | Characters → Words → Grammar → Meaning |
| Game playing | Board states → Strategies → "Move here to win" |
Deep learning vs traditional ML:
| Aspect | Traditional ML | Deep Learning |
|---|---|---|
| Feature engineering | Manual (humans define features) | Automatic (network learns features) |
| Data requirements | Hundreds to thousands | Thousands to millions |
| Interpretability | Often explainable | Often "black box" |
| Best for | Structured data, clear features | Unstructured data (images, text, audio) |
⚠️ Exam Trap: "Deep learning" doesn't mean "better learning"—it means "many layers." For simple problems with clear features (like predicting house prices from square footage), traditional regression often outperforms deep learning.