Skip to content

Refactor AWS authentication handling to use unified credentials#623

Merged
stewartshea merged 12 commits intorunwhen-contrib:mainfrom
stewartshea:updates/021326-02
Feb 17, 2026
Merged

Refactor AWS authentication handling to use unified credentials#623
stewartshea merged 12 commits intorunwhen-contrib:mainfrom
stewartshea:updates/021326-02

Conversation

@stewartshea
Copy link
Contributor

@stewartshea stewartshea commented Feb 16, 2026

Summary

  • Refactored AWS authentication across multiple code bundles to use a unified aws_credentials secret imported from the aws-auth block, replacing individual AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_ROLE_ARN imports
  • Removed legacy AWS_ASSUME_ROLE_CMD boilerplate from suite initialization across all AWS runbooks
  • Updated shell scripts to source a common auth.sh for consistent credential handling (IRSA, access key, assume role)
  • Streamlined YAML templates to use {% include "aws-auth.yaml" ignore missing %} for secrets
  • Removed the aws-s3-bucket-storage-report codebundle (superseded by unified auth approach)

Test plan

  • Verify AWS EKS health checks authenticate correctly with new credential flow
  • Verify AWS ElastiCache Redis health checks work with unified auth
  • Verify AWS Lambda health checks work with unified auth
  • Verify AWS CloudWatch EC2 inspection works with unified auth
  • Confirm no references to old individual AWS secret imports remain in modified files

Note

Medium Risk
Touches authentication wiring across several AWS runbooks/templates and changes execution env passthrough for AWS credentials, which could break access in some deployment setups if assumptions differ.

Overview
Adds a new aws-account-cost-health codebundle that analyzes AWS Cost Explorer spend trends and emits both an SLI (hourly health score) and a runbook that generates a detailed cost-by-service report plus RI/Savings Plans recommendation output.

Refactors multiple AWS codebundles (EKS health, Lambda health, ElastiCache Redis health, CloudWatch overused EC2) to use the unified aws_credentials/aws-auth.yaml auth flow, removing per-secret key/role wiring and bespoke assume-role boilerplate; updates RW.CLI to passthrough AWS config/credential environment variables needed for IRSA/pod identity and shared config.

Cleans up/standardizes bundle metadata and qualifiers (adds region/account qualifiers; converts location to locations) and removes legacy bundles/content (aws-s3-bucket-storage-report, aws-eks-node-reboot) while fixing a few minor runbook issues (cert-manager report ordering/variable fix; adds restart-related env vars to k8s daemonset/statefulset healthchecks).

Written by Cursor Bugbot for commit 1d03727. This will update automatically on new commits. Configure here.

- Updated various runbooks and scripts to utilize a unified AWS credentials management approach, importing credentials from the aws-auth block instead of individual secret imports.
- Enhanced the AWS CLI commands by removing unnecessary role assumption logic, simplifying the authentication process.
- Improved documentation within the scripts to clarify the use of AWS credentials and the expected environment setup.
- Streamlined YAML templates for AWS resources to include a common authentication configuration, enhancing maintainability and consistency across the codebase.
- Unified the authentication process by sourcing a common `auth.sh` script across various AWS-related scripts, enhancing maintainability and reducing code duplication.
- Updated the `auth.sh` script to improve the handling of AWS credentials, including verification of identity and role assumption logic.
- Improved documentation within the scripts to clarify usage and expected environment setup for AWS credentials.
@stewartshea stewartshea requested a review from a team as a code owner February 16, 2026 14:27
- Introduced ${CONTAINER_RESTART_AGE} and ${CONTAINER_RESTART_THRESHOLD} variables to both DaemonSet and StatefulSet healthcheck runbooks.
- Enhanced documentation for these parameters to clarify their purpose and usage in monitoring container restarts.
- Updated environment variable evaluations to include the new parameters, improving the robustness of health checks.
…esource name

- Modified the alias and asMeasuredBy fields in the aws-eks-health-slx.yaml template to reference the cluster name directly, improving clarity and accuracy in health check reporting.
…proved error handling

- Refactored `check_eks_cluster_health.sh` and `check_eks_fargate_cluster_health_status.sh` to include comprehensive issue tracking in JSON format, allowing for better visibility of cluster health issues.
- Implemented robust error handling for AWS CLI commands, capturing and reporting errors related to cluster listing and description.
- Updated runbook documentation to reflect changes in health check processes and added new tasks for monitoring EKS and Fargate profiles.
- Removed the deprecated `list_eks_fargate_metrics.sh` script to streamline the codebase.
- Updated `check_eks_cluster_health.sh`, `check_eks_fargate_cluster_health_status.sh`, and `check_eks_nodegroup_health.sh` to allow optional specification of the EKS cluster name for targeted health checks.
- Improved error handling and reporting for AWS CLI commands, ensuring robust feedback on cluster listing failures.
- Revised README and runbook documentation to reflect the new functionality and clarify the health check processes for EKS clusters, node groups, and Fargate profiles.
- Adjusted YAML templates to incorporate the cluster name in health metrics and reporting, enhancing clarity in monitoring outputs.
…entation

- Deleted the README.md and runbook.robot files for the AWS EKS Nodegroup health check codebundle, as they are no longer needed.
- This cleanup helps streamline the codebase by removing obsolete components related to EKS nodegroup health checks.
…ype from 'aws_accounts' to 'aws_ec2_vpcs' for improved resource monitoring.
- Introduced a constant `CE_REGION` set to `us-east-1` in `aws_cost_report.sh` and `aws_ri_recommendations.sh` to ensure consistent region usage for Cost Explorer queries.
- Updated AWS CLI commands in both scripts to utilize the specified region, enhancing reliability in cost data retrieval.
- Improved error handling in `runbook.robot` files for AWS authentication checks, allowing for multiple failure conditions to be captured and reported accurately.
… resource name

- Modified the description, alias, and value fields in the YAML templates to reference `account_id` for improved clarity and accuracy in cost monitoring.
- Ensured consistency across all relevant templates for better alignment with AWS account identification practices.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

"AWS_CONTAINER_CREDENTIALS_FULL_URI",
"AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE",
"AWS_STS_REGIONAL_ENDPOINTS",
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing ECS container credential passthrough

Medium Severity

execute_command() now forwards selected AWS env vars, but omits AWS_CONTAINER_CREDENTIALS_RELATIVE_URI. In ECS task-role environments, AWS CLI relies on that variable for credentials. Because local command execution builds a minimal env, this omission can make AWS-authenticated runbooks fail even when container credentials are correctly configured.

Fix in Cursor Fix in Web

"AWS_CONTAINER_CREDENTIALS_FULL_URI",
"AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE",
"AWS_STS_REGIONAL_ENDPOINTS",
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS profile variable not propagated

Medium Severity

The new AWS env passthrough list omits AWS_PROFILE (and related profile selectors). Since local command execution builds a restricted env, subprocesses lose profile selection and can authenticate against the wrong profile or fail when credentials are only available through a non-default profile.

Fix in Cursor Fix in Web

@stewartshea stewartshea merged commit 5e68fa4 into runwhen-contrib:main Feb 17, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant