Skip to content

RFC: AWS Containers plugin #148

@niallthomson

Description

@niallthomson

Is this related to an existing feature request or issue?

No response

Summary

Add a plugin to help developers containerize, deploy, manage, and troubleshoot containerized applications on AWS using Amazon ECS, Amazon EKS, and related container services (ECR). The plugin provides a unified entry point for container workloads on AWS, guiding developers from Dockerfile creation through production operations.

This plugin will also serve as a mechanism to package skills for open source projects which are maintained by AWS but are not strictly related to services. Examples of these include:

  1. Karpenter
  2. AWS Controllers for Kubernetes (ACK)
  3. Kube Resource Orchestration (kro)

Use case

These are example use cases and is not an exhaustive list:

  • Containerization of existing applications: Generating production-ready Dockerfiles with multi-stage builds, selecting appropriate base images, configuring .dockerignore files, building images, pushing to ECR and handling ECR lifecycle policies. Covers language-specific patterns (e.g., dependency caching for Node.js/Python/Java), image size optimization, building images compatible with Graviton and security hardening (non-root users, minimal base images, vulnerability scanning).
  • Troubleshooting common container issues: Diagnosing why containers fail to start, crash-loop, or misbehave in ECS/EKS environments. Covers signal handling (catching SIGTERM for graceful shutdown), health check configuration and failure diagnosis, image pull errors (ECR auth, missing tags, cross-account access), OOMKilled pods/tasks from incorrect memory limits, networking issues (security groups, service discovery, load balancer target registration), and reading CloudWatch/Container Insights logs to trace root cause.
  • ECS Express Mode deployments for rapid prototyping: Going from application code to a running ECS service with a public URL in minutes. Express Mode auto-provisions infrastructure (VPC, ALB, ECR, task definitions, service) so developers can validate their containerized app on AWS without writing IaC. Covers the full Express Mode lifecycle -- containerization, image build/push, prerequisite validation, deployment, readiness polling, and cleanup.
  • Container cost optimization: Right-sizing tasks and pods using CloudWatch Container Insights metrics (CPU/memory utilization), selecting between Fargate and EC2 launch types, adopting Spot and Graviton for cost reduction, and identifying idle or over-provisioned resources. Covers AWS Split Cost Allocation Data (SCAD) for per-task and per-pod cost visibility -- which requires CUR 2.0 exports to S3 and Athena queries rather than Cost Explorer APIs -- and service-specific optimization like Fargate Spot for ECS and Karpenter consolidation policies for EKS.

Proposal

Structure

plugins/aws-containers/
├── .claude-plugin/
│   └── plugin.json
├── .mcp.json
└── skills/
    ├── aws-containers/
    │   ├── SKILL.md
    │   └── references/
    │       ├── service-selection.md    
    │       └── containerization.md
    ├── aws-ecs/
    │   ├── SKILL.md
    │   └── references/
    │       ├── express-mode.md
    │       ├── task-definitions.md
    │       ├── service-configuration.md
    │       ├── networking.md
    │       ├── cost-optimization.md
    │       ├── compute-types.md    
    │       └── troubleshooting.md
    ├── aws-eks/
    │   ├── SKILL.md
    │   └── references/
    │       ├── auto-mode.md
    │       ├── app-deployment-patterns.md
    │       ├── resource-management.md
    │       ├── networking.md
    │       ├── cost-optimization.md
    │       └── troubleshooting.md
    └── karpenter/
        ├── SKILL.md
        └── references/
            ├── getting-started.md
            ├── nodepools.md
            ├── scheduling.md
            ├── disruption.md
            └── troubleshooting.md    

Skills

aws-containers

  • Purpose: General guidance related to containers on AWS - containerization best practices, Dockerfile generation, ECR image management, service selection (ECS vs EKS)
  • Trigger intent: "container", "Dockerfile", "Docker", "ECR", "container image", "which container service", "ECS or EKS", "Fargate"

aws-ecs

  • Purpose: Build, deploy, and operate containerized applications on Amazon ECS including Express Mode, Fargate, task definitions, services, auto-scaling, and ALB integration
  • Trigger intent: "ECS", "ECS service", "ECS cluster", "task definition", "Fargate", "ECS Express Mode", "ECS deploy", "container service", "ECS troubleshoot"

aws-eks

  • Purpose: Create and manage Amazon EKS clusters, deploy Kubernetes workloads, manage resources, and troubleshoot cluster issues
  • Trigger intent: "EKS", "Kubernetes", "k8s", "kubectl", "EKS cluster", "Kubernetes deployment", "Helm", "EKS node", "pod", "EKS troubleshoot"

karpenter

  • Purpose: Install and configure Karpenter to manage compute for EKS clusters
  • Trigger intent: "karpenter", "nodepool", "karpenter troubleshoot"

MCP servers

Server Type Purpose
ECS MCP Server http ECS Express Mode deployment, ECS resource management, troubleshooting
EKS MCP Server http Kubernetes resource CRUD, manifest generation, CloudWatch metrics, troubleshooting
AWS Knowledge MCP Server http Official AWS documentation search and access

Skill design

Each SKILL.md follows progressive disclosure per the project's design guidelines:

  • Initial load (~150-250 lines): Prerequisites, configuration, capability overview, reference file index with load conditions, quick-start examples, and best practices
  • On-demand references (5-7 files per skill, <100 lines each): Loaded only when the agent needs deep domain knowledge for a specific workflow

User experience

Before: Developers must manually research ECS vs EKS trade-offs, write Dockerfiles, configure task definitions or Kubernetes manifests, set up networking, and troubleshoot deployments across multiple AWS consoles and CLI tools.
After: Developers describe their intent naturally (e.g., "deploy this Node.js app to ECS", "create an EKS cluster", "containerize my Flask app", "troubleshoot my failing ECS tasks") and the agent auto-triggers the appropriate skill, loads relevant references, and uses the MCP servers to execute the workflow.

Prerequisites

  • AWS CLI configured with credentials
  • Docker or Finch installed (for containerization and local testing)
  • Required IAM permissions vary by skill:
    • Containers: ECR
    • ECS: ECS, ECR, CloudFormation, ELB, IAM, CloudWatch permissions
    • EKS: EKS, EC2, CloudFormation, IAM, CloudWatch permissions

Out of scope

  • Provisioning non-container AWS resources (databases, caches, queues) - use deploy-on-aws plugin instead
  • CI/CD pipeline generation - may be addressed by a future plugin or the deploy-on-aws plugin
  • Non-AWS container orchestration (Docker Compose)
  • App Runner support - App Runner is closing to new customers April 30, 2026. For simple web app deployment, use ECS Express Mode.
  • Cross-service AWS cost estimation and Savings Plans management - use the deploy-on-aws plugin's pricing integration

Potential challenges

  • Skill routing between ECS and EKS: Users may not know which service they want. The aws-containers skill acts as a triage layer, guiding users to the right skill based on their requirements. Mitigation: clear trigger phrases and a decision-tree reference in aws-containers.
  • MCP account/region scoping: Both ECS and EKS MCP servers use account and region-specific endpoints when configuring them via the IAM proxy. It will be necessary to understand how this can cater to cross-account and region workflows.
  • IAM permissions breadth: The combined ECS + EKS permissions surface is large. Users with restricted IAM policies may encounter partial functionality. Mitigation: per-skill prerequisites documentation and graceful error handling guidance.
  • Reference file size: Some reference files (troubleshooting, networking) may push against the 100-line guideline due to the breadth of container-related scenarios. Mitigation: content organized with clear headings for selective loading; SKILL.md files stay under 300 lines.

Dependencies and Integrations

Integration with existing plugins:

  • Complements deploy-on-aws by providing deep container-specific workflows beyond the general deployment recommendations
  • Complements aws-serverless for users choosing between serverless and container-based architectures
  • Complements aws-observability (if available) by providing container-specific monitoring and troubleshooting that pairs with broader observability workflows

Alternative solutions

  • Individual skills without a unified plugin: Users could use the ECS and EKS MCP servers separately and write their own prompts. The plugin adds value through curated skill descriptions, progressive-disclosure reference files, and the aws-containers triage skill that helps users choose the right service.
  • Separate plugins per service (aws-ecs, aws-eks): Could split into independent plugins. However, container workflows frequently span services (e.g., building an image in ECR and deploying to either ECS or EKS, or migrating between them), making a unified plugin more effective. The shared aws-containers skill also avoids duplication of containerization and ECR guidance.
  • Extend deploy-on-aws: Could add container skills to the existing deploy-on-aws plugin. However, deploy-on-aws focuses on general deployment with service selection, while aws-containers provides deep operational workflows (troubleshooting, scaling, resource management) that go well beyond initial deployment.
  • Avoid ECS/EKS MCP servers: Skills could leverage the AWS CLI and related tools like kubectl directly instead of using the MCP servers. This would be more flexible with regards to cross-region and cross-account access.
  • Open source projects separate: Skills for Karpenter, ACK and kro could be built and packaged separately, for example in their respective open source project repositories. However this seems like it would present a challenge for discoverability.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions