AWS-MLS-C01 & AWS CERTIFICATION | Data Access Controls (IAM, Resource Policies) - AWS Certified Machine Learning

2.4.3. Data Access Controls (IAM, Resource Policies)

First Principle: Robust data access controls fundamentally ensure that only authorized users and services can access specific data and ML resources, upholding the principle of least privilege and maintaining security and compliance.

Implementing proper access controls is a cornerstone of security and governance in any cloud environment, especially for sensitive ML data. This involves defining who can do what, where, and when.

Key AWS Mechanisms for Data Access Controls in ML:

AWS Identity and Access Management (IAM):
- What it is: The service that enables you to securely control access to AWS services and resources. You use IAM to manage users, groups, and roles, and their permissions.
- IAM Users: Individual identities for people or applications.
- IAM Groups: Collections of IAM users, making it easier to manage permissions for multiple users.
- IAM Roles: Identities that you can assume to get temporary permissions. Used by AWS services (e.g., SageMaker to access S3), EC2 instances, or users in other accounts.
- IAM Policies: JSON documents that define permissions. They can be attached to users, groups, or roles.
- Principle of Least Privilege: Grant only the permissions required to perform a task. This minimizes the blast radius in case of a security breach.
- Use Cases for ML: Granting SageMaker training jobs access to data in S3, allowing data scientists to access SageMaker notebooks, or controlling who can deploy models.
Resource Policies:
- What it is: Policies attached directly to a resource (e.g., an S3 bucket, an SQS queue, an SNS topic, a KMS key). They define who can access that specific resource.
- Amazon S3 Bucket Policies: Control access to objects within an S3 bucket. Can be used in conjunction with IAM policies.
- AWS KMS Key Policies: Define who can use or manage a specific KMS encryption key. Crucial for controlling access to encrypted ML data and models.
- Use Cases for ML: Ensuring only specific SageMaker roles can read/write to a particular S3 bucket containing sensitive training data, or restricting access to a KMS key used to encrypt model artifacts.
AWS Lake Formation: (As discussed in 2.4.2) Provides a simplified, centralized way to manage fine-grained access control for data lakes built on S3 and cataloged in AWS Glue Data Catalog. It translates its permissions into underlying IAM policies and S3 bucket policies.

Scenario: You need to configure permissions for a new ML project. Data scientists should only be able to read data from a specific S3 bucket for training, and SageMaker training jobs should only be able to write model artifacts to another specific S3 bucket. All data must be encrypted with a KMS key that only authorized personnel can manage.

Reflection Question: How do IAM policies (attached to users/roles) and resource policies (attached to S3 buckets or KMS keys) fundamentally ensure that only authorized users and services can access specific data and ML resources, upholding the principle of least privilege and maintaining security and compliance throughout the ML lifecycle?

💡 Tip: Remember that both IAM policies and resource policies can grant or deny access. When both apply, the most restrictive permission takes precedence (an explicit deny always overrides an allow).