1.2. How Generative AI Produces Output
💡 First Principle: Foundation models don't have a single "output mode" — they produce fundamentally different types of output depending on how you ask and which model you use. Text generation, embeddings, and multimodal outputs work through different mechanisms and serve different architectural roles.
Understanding output types is foundational because each type integrates differently into your system architecture. Text generation feeds user-facing interfaces. Embeddings feed vector databases. Multimodal outputs handle documents, images, and audio. Mixing them up means calling the wrong API, storing the wrong data type, and failing at retrieval time.
⚠️
| Output Type | What It Produces | Primary Consumer | Wrong Use |
|---|---|---|---|
| Text generation | Human-readable text tokens | User interfaces, downstream prompts | Passing to vector DB as embedding |
| Embeddings | Dense float vector (e.g. 1536 dims) | Vector database, similarity search | Displaying to users as text |
| Multimodal output | Text describing image/audio content | Document processing, analysis pipelines | Substituting for OCR output |
Common Misconception: Embedding models and text generation models are interchangeable — both accept text and produce output. In reality, an embedding model produces a dense numerical vector (the semantic meaning of text), not text. You cannot swap Amazon Titan Text Embeddings into a slot expecting generated text.