2.4.2. Machine Learning Value and Security Considerations
đź’ˇ First Principle: Machine learning models learn from data, which creates both their value and their risk. The value: models can recognize patterns and make predictions humans couldn't. The risk: if training data or model access is compromised, the model's value is compromised too.
Understanding the ML lifecycle helps you identify security touchpoints and make informed decisions about when ML adds value:
The ML lifecycle in detail:
- Problem definition — What business outcome are we predicting or classifying? Without a clear problem, ML produces impressive technology that solves nothing.
- Data collection — Gather relevant, representative training data. Where does it come from? Is it sensitive? Is it sufficient?
- Data preparation — Clean, label, and structure the data. This is often 60-80% of the total effort—messy data produces unreliable models.
- Model training — The algorithm learns patterns from the prepared data. Different algorithms suit different problems.
- Evaluation — Test the model against held-out data. Measure accuracy, fairness, and reliability. If insufficient, iterate.
- Deployment — Put the model into production where it serves real users and decisions.
- Monitoring and retraining — Models degrade over time as real-world data changes ("model drift"). Continuous monitoring detects degradation; retraining corrects it.
Key insight for business leaders: The ML lifecycle is iterative, not linear. Models require ongoing investment in monitoring and retraining—it's not a "build once, done forever" proposition. Budget for ongoing maintenance, not just initial development.
Security considerations at each ML lifecycle stage:
| Stage | Security Concern | Mitigation |
|---|---|---|
| Training data | Contains sensitive information | Data anonymization, access controls |
| Model weights | Represent significant IP investment | Access restrictions, encryption |
| Prompts | May contain confidential information | Secure transmission, logging policies |
| Outputs | May reveal protected information | Access controls, content filtering |
⚠️ Exam Trap: The exam may present scenarios where users want to share AI models externally. Consider what the model "knows" from training—sharing a model trained on confidential data could expose that information.
Reflection Question: Your organization trained a custom model on internal sales data. A partner company asks if they can use this model. What concerns should you raise?