2.1.2. Reliability and Safety
💡 First Principle: Reliability and safety mean the system behaves consistently and as intended, including under unexpected, rare, or adversarial conditions — and that when it does fail, it fails safely. Because models predict from patterns they've seen, novel situations are exactly where they're weakest, so this principle is about engineering for the inputs you didn't train on.
A medical-triage model might be 99% accurate on typical cases yet behave dangerously on a rare condition it rarely saw in training. Reliability and safety practices — rigorous testing, defining safe failure behaviors, monitoring in production, keeping humans in the loop for high-stakes decisions — exist to catch precisely these gaps. The goal isn't perfect accuracy (impossible); it's predictable, bounded behavior with safe responses when the model is uncertain or out of its depth.
⚠️ Exam Trap: "Reliability and safety just means high accuracy." Accuracy on typical inputs is only part of it. A model can be highly accurate on average and still unsafe if it behaves erratically on edge cases or fails in harmful ways. The principle emphasizes consistent, safe behavior across all conditions, including the unexpected.
Reflection Question: Why might a model with 99% accuracy still violate the reliability and safety principle? What's hiding in the other 1%?