6.3.3. Audit Trails for Models and Data
💡 First Principle: AI audit trails must answer a question that traditional audit logs can't: why did the AI make this decision? This requires tracking not just who did what (standard audit), but which model version was running, what data it retrieved, what prompt it used, and how those factors combined to produce the output.
AI-Specific Audit Trail Components:
| Component | What to Track | Why It's Needed |
|---|---|---|
| Model version | Which model version produced each output | Reproduce and investigate decisions |
| Training data version | Which dataset version trained the active model | Trace bias or accuracy issues to training data |
| Prompt template version | Which system prompt was active | Explain behavior changes after prompt updates |
| Grounding data accessed | Which documents/records the model retrieved | Verify the AI had access to correct information |
| Decision inputs | User input + retrieved context + model parameters | Full reproducibility of the decision |
| Decision output | The AI's response and any actions taken | Record of what the AI said or did |
| Change log | Who changed model, prompt, data, or configuration — and when | Accountability for configuration changes |
Audit Trail Architecture:
The architect must design audit trails that are tamper-resistant (immutable log storage), queryable (can answer regulatory inquiries efficiently), and retention-compliant (stored for required periods, deleted when mandated).
Decision Lineage:
For high-stakes AI decisions (financial approvals, customer classifications, compliance determinations), the architect designs decision lineage — a complete chain from user input through model processing to output, including all intermediate steps. This isn't real-time logging; it's structured recording that enables after-the-fact investigation.
⚠️ Common Misconception: AI audit trails are the same as standard application audit logs. AI audit trails must additionally track model version changes, training data modifications, prompt template updates, decision lineage, and output quality metrics.
Troubleshooting Scenario: A financial services company's AI model makes an investment recommendation that causes significant client losses. During the subsequent investigation, regulators request a complete decision audit trail. The company can show which model version was deployed and when, but cannot demonstrate why the model recommended that specific action — the decision lineage is missing. Decision lineage tracks the chain from input data through model reasoning to output recommendation, including which knowledge sources were consulted, what confidence scores were generated, and what fallback logic was triggered.
AI audit trails must capture six categories beyond standard application logs: model version changes, training data modifications, prompt template updates, decision lineage (input → reasoning → output chain), output quality metrics over time, and human override records. Each category serves a different compliance purpose and has different retention requirements.
Reflection Question: A financial services regulator asks your company to explain why an AI agent denied a specific customer's loan application six months ago. Design the audit trail architecture that enables you to answer this question with full decision lineage.