5.2.2. Lake Formation Permissions and Fine-Grained Access
š” First Principle: Lake Formation centralizes data lake permissions in one place ā instead of managing S3 bucket policies, IAM policies, and Glue catalog policies separately, Lake Formation provides a single control point that grants and revokes access to databases, tables, columns, and rows. It's the "permission authority" for your data lake.
Permission model: Lake Formation uses a grant/revoke model similar to SQL databases. A data lake administrator grants permissions (SELECT, ALTER, DROP, CREATE_TABLE) on databases, tables, or columns to IAM principals. These permissions override S3 and Glue IAM policies for Lake Formation-governed resources.
Column-level security: Grant SELECT on specific columns only. A marketing analyst might see customer_name, purchase_amount, and product_category but not ssn or credit_card_number. This is enforced at query time ā Athena, Redshift Spectrum, and EMR all respect Lake Formation column filters.
Row-level security (data filters): Define filters that restrict which rows a principal can see. Example: region = 'US' restricts an analyst to US data only. Filters are defined as expressions and attached to table grants.
LF-Tags (Tag-based access control): Assign tags to databases, tables, and columns (e.g., classification=pii, department=finance). Grant permissions based on tag expressions. This scales dramatically better than per-table grants ā new tables with the right tags automatically inherit the correct permissions.
Cross-account sharing: Lake Formation enables sharing data lake resources across AWS accounts without copying data. The owning account grants permissions to the consuming account's principals.
ā ļø Exam Trap: Lake Formation permissions and IAM S3 policies can conflict. When Lake Formation is enabled, you typically register S3 locations with Lake Formation and let it manage access ā IAM S3 policies should be simplified to allow Lake Formation's service role. If both Lake Formation and IAM policies are active, the more restrictive wins, which can create confusing access denials.
Reflection Question: A healthcare data lake has patient records with columns including patient_name, diagnosis, treatment, and ssn. Different teams need different access: clinical researchers need all columns except ssn; billing needs patient_name, treatment, and ssn; data scientists need anonymized data with no PII. How does Lake Formation handle this?