Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 22 additions & 16 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,42 +6,48 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

Cerberus is an AWS SAM application that automatically removes unwanted default AWS Control Tower IAM Identity Center permission set assignments. It intercepts `CreateAccountAssignment` CloudTrail events and deletes the assignment when it matches configured regex patterns.

## Two-Account Deployment
## Single-Account Deployment in the Management Account

This app spans two AWS accounts:
Cerberus must be deployed in the AWS Organization management account. IAM Identity Center enforces a service-level restriction (invisible to IAM, SCPs, and the delegated-admin configuration): permission sets whose lifecycle is owned by the management account — every Control Tower default — can only have their assignments removed by a principal in the management account itself. A delegated admin returns `AccessDeniedException` regardless of IAM permissions, which is why Cerberus does not run in a delegated-admin account.

- **Management account**: `cft-eventbridge-rule.yaml` is a standalone CloudFormation template (not SAM) that forwards `sso:CreateAccountAssignment` events cross-account to the custom event bus in the delegated admin account.
- **Delegated admin account**: `cerberus/template.yaml` is the SAM app. Deploy here.

Never conflate these two templates. `sam build` / `sam deploy` only touch `cerberus/`.

## Critical Code Quirk

`cerberus/src/cerberus/app.py` around line 120 unconditionally overwrites the real `sso:DeleteAccountAssignment` API response with a hardcoded `{"AccountAssignmentDeletionStatus": {"Status": "SUCCEEDED"}}`. This means the function always reports success regardless of what the API actually returned. Verify intent with the team before modifying this function or adding response-based branching logic.
`cerberus/template.yaml` is the SAM app. Deploy it in the management account. There are no other CloudFormation templates in the repository.

## Primary Tuning Surface

The three Lambda environment variables below are the main way to control what gets deleted. They are regex patterns set in `cerberus/template.yaml`:
Lambda environment variables (set in `cerberus/template.yaml`):

- `PermissionSetNamePattern`
- `PrincipalGroupNamePattern`
- `PrincipalUserNameEmail`
- `PermissionSetNamePattern` — regex matched against the permission set name (case-insensitive).
- `PrincipalGroupNamePattern` — regex matched against the principal name when `principalType=GROUP`.
- `PrincipalUserNameEmail` — exact email match against the principal name when `principalType=USER`.
- `Mode` — `ENFORCE` | `DRY_RUN` | `DISABLED`. `DRY_RUN` logs would-delete decisions without calling the SSO API; `DISABLED` turns off the EventBridge rule and short-circuits the Lambda. Operational kill switch + dry-run capability.

## Testing

Tests use stdlib `unittest`, not pytest. Do not add pytest dependencies or use pytest-style fixtures. Test file: `cerberus/tests/unit/test_cerberus.py`.

## State Machine & Lambda Authoring Conventions

When editing `cerberus/statemachine/cerberus.asl.json` or `cerberus/src/cerberus/app.py`, apply these defaults at write time — don't wait for a reviewer to add them.

- **Every Task state needs an explicit `Retry`, applied uniformly.** A defensive pattern on one Task and not its siblings is worse than no pattern — it signals "we thought about this" while leaving peers exposed. AWS SDK integration tasks (`arn:aws:states:::aws-sdk:*`) are subject to throttling and transient network errors; one throttle without retry fails the whole execution.
- SDK integration default: 3 attempts on `States.TaskFailed`, 3s interval, 2.0× backoff.
- Lambda invoke default: scope `ErrorEquals` to `Lambda.ServiceException`, `Lambda.AWSLambdaException`, `Lambda.SdkClientException`, `Lambda.TooManyRequestsException`. Do **not** retry blanket `States.TaskFailed` for Lambda — the Cerberus Lambda wraps every business-logic exception into a structured `{"result": "FAILED"}` return, so `States.TaskFailed` only fires on crashes (timeout/OOM/runtime), which retrying with identical input cannot fix and will burn ~5×timeout-seconds of wall time before surfacing.

- **Choice-state validations must match what the Lambda actually reads.** When the ASL feeds data to the Lambda, every `Is X Returned?` Choice must check the exact JSONPath the Lambda consumes — not a sibling field. Trace `event.get(...)` calls in `cerberus/src/cerberus/app.py` and align `Variable` paths in the ASL to match. Sibling-field validation can fail-closed on valid input when the sibling is optional (`DisplayName` on Identity Store users is the canonical case: `UserName` is required, `DisplayName` is not).

- **Prefer `StringEquals` over `StringMatches` for exact strings.** `StringMatches` allows wildcards — use it only when you mean it.

## MCP Servers

Two MCP servers are configured in `.mcp.json` at the repo root. Use them proactively — don't guess at AWS API shapes or dig through logs manually.

**`awslabs.aws-documentation-mcp-server`** — AWS official docs, resource schemas, IAM policy references, API signatures. Reach for this whenever you're working on `cerberus/template.yaml`, `cerberus/statemachine/cerberus.asl.json`, or `cft-eventbridge-rule.yaml`, or any time you need to verify an AWS API call, IAM action name, or resource attribute.
**`awslabs.aws-documentation-mcp-server`** — AWS official docs, resource schemas, IAM policy references, API signatures. Reach for this whenever you're working on `cerberus/template.yaml` or `cerberus/statemachine/cerberus.asl.json`, or any time you need to verify an AWS API call, IAM action name, or resource attribute.

**`awslabs-cloudwatch-mcp-server`** — Live CloudWatch access to the deployed Cerberus stack. Use this to debug Step Functions execution failures, inspect Lambda errors, or trace an event end-to-end. The default log group is `/cerberus` (parameterized at deploy time). The server is pre-configured with `AWS_PROFILE=cerberus` and `AWS_REGION=ca-central-1`; the `cerberus` profile must exist locally with CloudWatch read-only access (see `cerberus/README.md` for profile setup).

## Plugins

The **`aws-serverless` plugin** (`aws-serverless@claude-plugins-official`) is enabled at project scope in `.claude/settings.json`. It provides SAM-aware skills and serverless-specific context for working with `cerberus/template.yaml`, `cerberus/statemachine/cerberus.asl.json`, and `cft-eventbridge-rule.yaml`.
The **`aws-serverless` plugin** (`aws-serverless@claude-plugins-official`) is enabled at project scope in `.claude/settings.json`. It provides SAM-aware skills and serverless-specific context for working with `cerberus/template.yaml` and `cerberus/statemachine/cerberus.asl.json`.

The **`code-review` plugin** (`code-review@claude-plugins-official`) is also enabled. See PR Requirements below.

Expand Down
72 changes: 72 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Cerberus — wrapper around SAM CLI + unit tests.
# Single entry point for CI/CD so deploys and tests don't depend on remembered
# command sequences. Targets are self-documenting; run `make help`.

SAM_DIR := cerberus
PYTHON ?= python3
VENV := $(SAM_DIR)/.venv
VENV_PY := $(VENV)/bin/python
VENV_MARKER := $(VENV)/.installed
REQUIREMENTS := $(SAM_DIR)/tests/requirements.txt
AWS_REGION ?= ca-central-1

# Required for `make deploy`. No defaults — failing closed is intentional.
NOTIFICATION_EMAIL ?=

# Optional parameter overrides for `make deploy`. Unset => template defaults apply.
MODE ?=
PERMISSION_SET_PATTERN ?=
PRINCIPAL_GROUP_PATTERN ?=
PRINCIPAL_USER_EMAIL ?=
LOG_GROUP_NAME ?=
LOG_GROUP_RETENTION ?=

.DEFAULT_GOAL := help
.PHONY: help install validate test check build deploy clean _check-deploy-params

help: ## Show available targets
@awk 'BEGIN {FS = ":.*?## "} /^[a-zA-Z_-]+:.*?## / {printf " \033[36m%-16s\033[0m %s\n", $$1, $$2}' $(MAKEFILE_LIST)

$(VENV_MARKER): $(REQUIREMENTS)
$(PYTHON) -m venv $(VENV)
$(VENV)/bin/pip install --quiet --upgrade pip
$(VENV)/bin/pip install --quiet -r $(REQUIREMENTS)
@touch $(VENV_MARKER)

install: $(VENV_MARKER) ## Set up local venv with test dependencies

validate: ## Lint and validate the SAM template
cd $(SAM_DIR) && sam validate --lint

test: $(VENV_MARKER) ## Run unit tests
AWS_DEFAULT_REGION=$(AWS_REGION) $(VENV_PY) -m unittest discover -s $(SAM_DIR)/tests/unit -t . -v

check: validate test ## CI gate — validate + test

build: ## Build deployment artifacts (sam build)
cd $(SAM_DIR) && sam build

deploy: _check-deploy-params build ## Deploy stack (requires NOTIFICATION_EMAIL)
cd $(SAM_DIR) && sam deploy \
$(if $(CI),--no-confirm-changeset) \
--parameter-overrides \
NotificationEmail=$(NOTIFICATION_EMAIL) \
$(if $(MODE),Mode=$(MODE)) \
$(if $(PERMISSION_SET_PATTERN),"PermissionSetNamePattern=$(PERMISSION_SET_PATTERN)") \
$(if $(PRINCIPAL_GROUP_PATTERN),"PrincipalGroupNamePattern=$(PRINCIPAL_GROUP_PATTERN)") \
$(if $(PRINCIPAL_USER_EMAIL),PrincipalUserNameEmail=$(PRINCIPAL_USER_EMAIL)) \
$(if $(LOG_GROUP_NAME),LogGroupName=$(LOG_GROUP_NAME)) \
$(if $(LOG_GROUP_RETENTION),LogGroupRetentionDays=$(LOG_GROUP_RETENTION))

clean: ## Remove build artifacts and venv
rm -rf $(SAM_DIR)/.aws-sam $(VENV)

_check-deploy-params:
@if [ -z "$(NOTIFICATION_EMAIL)" ]; then \
echo "ERROR: NOTIFICATION_EMAIL is required"; \
echo ""; \
echo "Usage: make deploy NOTIFICATION_EMAIL=ops@example.com"; \
echo "Optional: MODE={ENFORCE|DRY_RUN|DISABLED} PERMISSION_SET_PATTERN='...' PRINCIPAL_GROUP_PATTERN='...' PRINCIPAL_USER_EMAIL='...' LOG_GROUP_NAME='/cerberus' LOG_GROUP_RETENTION=14"; \
echo "In CI: set CI=true to skip the interactive changeset confirmation."; \
exit 1; \
fi
25 changes: 17 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,24 +8,33 @@ The default **IAM Identity Center Groups for AWS Control Tower** are rather perm

We have created [Cerberus](https://www.britannica.com/topic/Cerberus) to monitor events from the `sso.amazonaws.com` service. Cerberus, often referred to as the hound of Hades, is a multi-headed dog that guards the gates of the underworld to prevent the dead from leaving, or in this case, prevent `CreateAccountAssignment` of unauthorized (unwanted) default permission sets to AWS Control Tower managed accounts.

# AWS Serverless Application Model (SAM)
## Deployment

Instruction on how to deploy the application, [Cerberus AWS SAM App](cerberus/README.md).
Cerberus is a single [AWS SAM](https://docs.aws.amazon.com/serverless-application-model/) stack that must be deployed in the AWS Organization **management account**. IAM Identity Center enforces a service-level restriction that prevents a delegated administrator from removing assignments owned by the management account — see [cerberus/README.md](cerberus/README.md#why-this-runs-in-the-aws-organization-management-account) for the full explanation, pre-deploy security checklist, parameter reference, and migration path from the older delegated-admin topology.

Deployment steps:
The repository ships a top-level `Makefile` as the single entry point for build, test, and deploy — no remembered SAM CLI command sequences required.

1. Deploy the [Cerberus AWS SAM App](cerberus/template.yaml) in the Management or delegated administrator IAM Identity Center account
2. Deploy the [EventBrdige Rule](cft-eventbridge-rule.yaml) in the Management account
- Reference the Output `EventBusArn` from the **Cerberus AWS SAM App** deployed stack for `CerberusEventBusArn` parameter
```bash
make help # List all available targets
make check # Validate template + run unit tests
make deploy \
NOTIFICATION_EMAIL=oncall@example.com \
MODE=DRY_RUN # First-time deploy in DRY_RUN
```

After observing `DRY_RUN: would remove ...` lines in the `/cerberus` log group for real `CreateAccountAssignment` events, re-run with `MODE=ENFORCE` (or omit — `ENFORCE` is the template default).

In CI, set `CI=true` to skip the interactive changeset confirmation that `cerberus/samconfig.toml` enables by default.

## Contributing

Contributions are welcome! Please follow these steps:

1. Fork the repository.
2. Create a feature branch.
3. Commit your changes.
4. Submit a pull request.
3. Run `make check` locally — must pass before opening a PR.
4. Commit your changes.
5. Submit a pull request.

## Code Formatting

Expand Down
Loading
Loading