Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.
3.4.2.6. Automating Sensitive Data Discovery at Scale (Amazon Macie)
3.4.2.6. Automating Sensitive Data Discovery at Scale (Amazon Macie)
Amazon Macie uses ML to automatically discover, classify, and protect sensitive data in S3.
Macie capabilities:
- Automated discovery: Scans all S3 buckets in the account to inventory sensitive data
- Sensitive data detection: Identifies PII (names, SSNs, credit cards), financial data, credentials, and custom patterns
- Policy findings: Detects S3 bucket misconfigurations (public access, unencrypted, shared externally)
- Custom data identifiers: Regex patterns for organization-specific sensitive data (employee IDs, internal codes)
# Create a classification job for specific buckets
aws macie2 create-classification-job \
--job-type ONE_TIME \
--s3-job-definition '{
"bucketDefinitions": [{
"accountId": "123456789012",
"buckets": ["customer-data-bucket", "logs-bucket"]
}]
}' \
--name "PII-Discovery-Q1"
Macie findings integration:
- Findings published to Security Hub for centralized management
- EventBridge integration for automated responses
- Example: Macie finds PII in an unencrypted bucket → EventBridge → Lambda → enable SSE-KMS
Multi-account Macie: Delegate a Macie administrator account that manages Macie across all Organization accounts. Findings aggregate to the admin account.
Exam Trap: Macie charges per GB of data scanned. A full scan of terabytes of S3 data can be expensive. Use sampling (scan a percentage of objects) for initial discovery, then targeted full scans on buckets identified as high-risk. Also, Macie only scans S3 — it doesn't scan DynamoDB, RDS, or EFS. For database-level data classification, use other tools or custom solutions.

Written byAlvin Varughese•Founder•15 professional certifications