5.1.3. Troubleshooting Authentication Issues
First Principle: Authentication failures produce specific, diagnosable error patterns. Systematic troubleshooting starts with identifying the authentication mechanism, checking the error message, and working through the common failure causes for that mechanism.
Common Authentication Failures:
| Symptom | Likely Cause | Diagnostic |
|---|---|---|
| "Access Denied" after federation | Permission set doesn't cover the action | Check Identity Center permission sets |
| MFA token rejected | Clock drift between device and AWS | Resync MFA device in IAM console |
| Cognito sign-in fails | User pool configuration (password policy, required attributes) | Check Cognito user pool settings |
| Cross-account AssumeRole fails | Trust policy doesn't include the calling account/role | Check role trust policy in target account |
| STS token expired | Session duration too short or token not refreshed | Extend max session duration, refresh token |
CloudTrail for Authentication Troubleshooting:
- Every authentication event is logged in CloudTrail
- Failed
AssumeRolecalls show the error reason - Failed console logins appear as
ConsoleLoginevents witherrorMessage - Filter by
errorCode: "AccessDenied"to find all auth failures
Identity Center Troubleshooting:
- Verify the identity source connection (AD Connector health, SAML metadata)
- Check permission set assignments (user → group → permission set → account)
- Verify the permission set's IAM policy matches the required actions
- Check
AWSServiceRoleForSSOexists in target accounts
AWS Directory Service Troubleshooting:
- AD Connector requires network connectivity to on-premises AD (VPN or Direct Connect)
- DNS resolution must work for AD domain
- Service account credentials must be valid and not expired
- Security group must allow AD-related ports (LDAP 389, LDAPS 636, Kerberos 88)
⚠️ Exam Trap: The most common cause of federation failures is a misconfigured trust policy — the target role doesn't trust the IdP or the calling principal. Always check trust policies first when cross-account or federated access fails.
Scenario: Developers report they can't access a specific AWS account through Identity Center. You check CloudTrail and find AssumeRole failures with errorCode: AccessDenied. The permission set is assigned correctly, but the IAM role's trust policy in the target account was manually modified and no longer trusts the Identity Center service principal.
Reflection Question: Why is CloudTrail the first tool to check for authentication failures, and what specific event fields help you diagnose the root cause?