3.4.2.6. Automating Sensitive Data Discovery at Scale (Amazon Macie)

Amazon Macie uses ML to automatically discover, classify, and protect sensitive data in S3.

Macie capabilities:

Automated discovery: Scans all S3 buckets in the account to inventory sensitive data
Sensitive data detection: Identifies PII (names, SSNs, credit cards), financial data, credentials, and custom patterns
Policy findings: Detects S3 bucket misconfigurations (public access, unencrypted, shared externally)
Custom data identifiers: Regex patterns for organization-specific sensitive data (employee IDs, internal codes)

# Create a classification job for specific buckets
aws macie2 create-classification-job \
  --job-type ONE_TIME \
  --s3-job-definition '{
    "bucketDefinitions": [{
      "accountId": "123456789012",
      "buckets": ["customer-data-bucket", "logs-bucket"]
    }]
  }' \
  --name "PII-Discovery-Q1"

Macie findings integration:

Findings published to Security Hub for centralized management
EventBridge integration for automated responses
Example: Macie finds PII in an unencrypted bucket → EventBridge → Lambda → enable SSE-KMS

Multi-account Macie: Delegate a Macie administrator account that manages Macie across all Organization accounts. Findings aggregate to the admin account.

Exam Trap: Macie charges per GB of data scanned. A full scan of terabytes of S3 data can be expensive. Use sampling (scan a percentage of objects) for initial discovery, then targeted full scans on buckets identified as high-risk. Also, Macie only scans S3 — it doesn't scan DynamoDB, RDS, or EFS. For database-level data classification, use other tools or custom solutions.

Written byAlvin Varughese•Founder•15 professional certifications