Refactor AWS authentication handling to use unified credentials#623
Refactor AWS authentication handling to use unified credentials#623stewartshea merged 12 commits intorunwhen-contrib:mainfrom
Conversation
- Updated various runbooks and scripts to utilize a unified AWS credentials management approach, importing credentials from the aws-auth block instead of individual secret imports. - Enhanced the AWS CLI commands by removing unnecessary role assumption logic, simplifying the authentication process. - Improved documentation within the scripts to clarify the use of AWS credentials and the expected environment setup. - Streamlined YAML templates for AWS resources to include a common authentication configuration, enhancing maintainability and consistency across the codebase.
- Unified the authentication process by sourcing a common `auth.sh` script across various AWS-related scripts, enhancing maintainability and reducing code duplication. - Updated the `auth.sh` script to improve the handling of AWS credentials, including verification of identity and role assumption logic. - Improved documentation within the scripts to clarify usage and expected environment setup for AWS credentials.
- Introduced ${CONTAINER_RESTART_AGE} and ${CONTAINER_RESTART_THRESHOLD} variables to both DaemonSet and StatefulSet healthcheck runbooks.
- Enhanced documentation for these parameters to clarify their purpose and usage in monitoring container restarts.
- Updated environment variable evaluations to include the new parameters, improving the robustness of health checks.
…esource name - Modified the alias and asMeasuredBy fields in the aws-eks-health-slx.yaml template to reference the cluster name directly, improving clarity and accuracy in health check reporting.
…proved error handling - Refactored `check_eks_cluster_health.sh` and `check_eks_fargate_cluster_health_status.sh` to include comprehensive issue tracking in JSON format, allowing for better visibility of cluster health issues. - Implemented robust error handling for AWS CLI commands, capturing and reporting errors related to cluster listing and description. - Updated runbook documentation to reflect changes in health check processes and added new tasks for monitoring EKS and Fargate profiles. - Removed the deprecated `list_eks_fargate_metrics.sh` script to streamline the codebase.
- Updated `check_eks_cluster_health.sh`, `check_eks_fargate_cluster_health_status.sh`, and `check_eks_nodegroup_health.sh` to allow optional specification of the EKS cluster name for targeted health checks. - Improved error handling and reporting for AWS CLI commands, ensuring robust feedback on cluster listing failures. - Revised README and runbook documentation to reflect the new functionality and clarify the health check processes for EKS clusters, node groups, and Fargate profiles. - Adjusted YAML templates to incorporate the cluster name in health metrics and reporting, enhancing clarity in monitoring outputs.
…entation - Deleted the README.md and runbook.robot files for the AWS EKS Nodegroup health check codebundle, as they are no longer needed. - This cleanup helps streamline the codebase by removing obsolete components related to EKS nodegroup health checks.
…ype from 'aws_accounts' to 'aws_ec2_vpcs' for improved resource monitoring.
codebundles/aws-account-cost-health/.runwhen/templates/aws-account-cost-health-sli.yaml
Show resolved
Hide resolved
- Introduced a constant `CE_REGION` set to `us-east-1` in `aws_cost_report.sh` and `aws_ri_recommendations.sh` to ensure consistent region usage for Cost Explorer queries. - Updated AWS CLI commands in both scripts to utilize the specified region, enhancing reliability in cost data retrieval. - Improved error handling in `runbook.robot` files for AWS authentication checks, allowing for multiple failure conditions to be captured and reported accurately.
… resource name - Modified the description, alias, and value fields in the YAML templates to reference `account_id` for improved clarity and accuracy in cost monitoring. - Ensured consistency across all relevant templates for better alignment with AWS account identification practices.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| "AWS_CONTAINER_CREDENTIALS_FULL_URI", | ||
| "AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE", | ||
| "AWS_STS_REGIONAL_ENDPOINTS", | ||
| ] |
There was a problem hiding this comment.
Missing ECS container credential passthrough
Medium Severity
execute_command() now forwards selected AWS env vars, but omits AWS_CONTAINER_CREDENTIALS_RELATIVE_URI. In ECS task-role environments, AWS CLI relies on that variable for credentials. Because local command execution builds a minimal env, this omission can make AWS-authenticated runbooks fail even when container credentials are correctly configured.
| "AWS_CONTAINER_CREDENTIALS_FULL_URI", | ||
| "AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE", | ||
| "AWS_STS_REGIONAL_ENDPOINTS", | ||
| ] |
There was a problem hiding this comment.
AWS profile variable not propagated
Medium Severity
The new AWS env passthrough list omits AWS_PROFILE (and related profile selectors). Since local command execution builds a restricted env, subprocesses lose profile selection and can authenticate against the wrong profile or fail when credentials are only available through a non-default profile.


Summary
aws_credentialssecret imported from theaws-authblock, replacing individualAWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, andAWS_ROLE_ARNimportsAWS_ASSUME_ROLE_CMDboilerplate from suite initialization across all AWS runbooksauth.shfor consistent credential handling (IRSA, access key, assume role){% include "aws-auth.yaml" ignore missing %}for secretsaws-s3-bucket-storage-reportcodebundle (superseded by unified auth approach)Test plan
Note
Medium Risk
Touches authentication wiring across several AWS runbooks/templates and changes execution env passthrough for AWS credentials, which could break access in some deployment setups if assumptions differ.
Overview
Adds a new
aws-account-cost-healthcodebundle that analyzes AWS Cost Explorer spend trends and emits both an SLI (hourly health score) and a runbook that generates a detailed cost-by-service report plus RI/Savings Plans recommendation output.Refactors multiple AWS codebundles (EKS health, Lambda health, ElastiCache Redis health, CloudWatch overused EC2) to use the unified
aws_credentials/aws-auth.yamlauth flow, removing per-secret key/role wiring and bespoke assume-role boilerplate; updatesRW.CLIto passthrough AWS config/credential environment variables needed for IRSA/pod identity and shared config.Cleans up/standardizes bundle metadata and qualifiers (adds region/account qualifiers; converts
locationtolocations) and removes legacy bundles/content (aws-s3-bucket-storage-report,aws-eks-node-reboot) while fixing a few minor runbook issues (cert-manager report ordering/variable fix; adds restart-related env vars to k8s daemonset/statefulset healthchecks).Written by Cursor Bugbot for commit 1d03727. This will update automatically on new commits. Configure here.