3.3.1. Text Analysis Techniques
💡 First Principle: Text analysis turns unstructured language into structured information. Each technique extracts a different kind of structure: what the text is about, who or what it mentions, how it feels, or a shorter version of it. Knowing which technique produces which output lets you match a scenario instantly.
The core techniques the exam names: keyword extraction (pull out the main terms/phrases), entity detection (identify named things — people, places, dates, organizations), sentiment analysis (classify emotional tone as positive/negative/neutral), and summarization (produce a shorter version preserving key points). These are bread-and-butter language workloads, and modern generative models can perform all of them from a prompt.
| Technique | Input | Output | Scenario Cue |
|---|---|---|---|
| Keyword extraction | Text | Main terms/phrases | "What is this document about?" |
| Entity detection | Text | Named people/places/dates/orgs | "Find all the company names" |
| Sentiment analysis | Text | Positive / negative / neutral | "Are customers happy?" |
| Summarization | Long text | Shorter text | "Give me the gist" |
⚠️ Exam Trap: Sentiment analysis detects tone, not truth. A negative review and a false statement are different things — negative sentiment doesn't mean the content is inaccurate, and positive sentiment doesn't mean it's correct.
Reflection Question: A company wants to automatically route incoming emails by extracting the sender's name, the product mentioned, and whether the customer sounds upset. Which text analysis techniques does each part require?