Skip to content

Support Per-Stack GitHub Actions Jobs for Parallel Deployment within Stages #162

@ahammond

Description

@ahammond

Support Per-Stack GitHub Actions Jobs for Parallel Deployment within Stages

Problem Statement

Currently, projen-pipelines generates a single GitHub Actions job per stage that deploys all CDK stacks using cdk deploy "Prefix-stage/*". While CDK's --concurrency flag enables parallel stack deployment, this approach has significant limitations for complex multi-stack applications:

  • Limited visibility: All stack deployments share a single job log, making it difficult to identify which specific stack failed or is taking longer
  • Monolithic failures: If one stack fails, the entire stage deployment fails without clear indication of which stack caused the issue
  • No dependency control: Cannot express explicit dependencies between stacks (e.g., Network → Database → Application)
  • Reduced observability: GitHub Actions UI shows only one job running, hiding the parallel execution happening inside CDK
  • Difficult troubleshooting: Stack-specific logs are buried in the combined output

Use Case

Large CDK applications often consist of multiple related stacks that need to be deployed:

  • With parallelism: Database, Cache, Queue stacks can deploy simultaneously after Network completes
  • Across environments: dev → staging → prod promotion pipeline
  • With visibility: DevOps teams need to quickly identify which stack is blocked or failing

Example application structure:

MyApp-dev/
  ├─ Network      (foundational)
  ├─ Database     (depends on Network)
  ├─ Cache        (depends on Network)
  ├─ Queue        (depends on Network)
  ├─ API          (depends on Database, Queue)
  └─ WebApp       (depends on API, Cache)

Desired deployment flow:

Sequential across environments: dev → staging → prod
Parallel within environment: Network first, then Database+Cache+Queue in parallel,
                            then API+WebApp in parallel

Current Behavior

# Generated workflow
jobs:
  deploy-dev:
    runs-on: ubuntu-latest
    needs: [assetUpload]
    concurrency:
      group: deploy-dev
      cancel-in-progress: false
    steps:
      - run: cdk deploy "MyApp-dev/*" --concurrency 10
      # All 6 stacks deploy in single job
      # Limited visibility into per-stack progress

Desired Behavior

# Generated workflow with per-stack jobs
jobs:
  deploy-dev-Network:
    runs-on: ubuntu-latest
    needs: [assetUpload]
    concurrency:
      group: deploy-dev-Network
      cancel-in-progress: false
    steps:
      - run: cdk deploy MyApp-dev/Network

  deploy-dev-Database:
    runs-on: ubuntu-latest
    needs: [assetUpload, deploy-dev-Network]
    concurrency:
      group: deploy-dev-Database
      cancel-in-progress: false
    steps:
      - run: cdk deploy MyApp-dev/Database

  deploy-dev-Cache:
    runs-on: ubuntu-latest
    needs: [assetUpload, deploy-dev-Network]
    concurrency:
      group: deploy-dev-Cache
      cancel-in-progress: false
    steps:
      - run: cdk deploy MyApp-dev/Cache

  deploy-dev-Queue:
    runs-on: ubuntu-latest
    needs: [assetUpload, deploy-dev-Network]
    concurrency:
      group: deploy-dev-Queue
      cancel-in-progress: false
    steps:
      - run: cdk deploy MyApp-dev/Queue

  deploy-dev-API:
    runs-on: ubuntu-latest
    needs: [deploy-dev-Database, deploy-dev-Queue]
    concurrency:
      group: deploy-dev-API
      cancel-in-progress: false
    steps:
      - run: cdk deploy MyApp-dev/API

  deploy-dev-WebApp:
    runs-on: ubuntu-latest
    needs: [deploy-dev-API, deploy-dev-Cache]
    concurrency:
      group: deploy-dev-WebApp
      cancel-in-progress: false
    steps:
      - run: cdk deploy MyApp-dev/WebApp

  # Next stage waits for all previous stage stacks
  deploy-staging-Network:
    needs:
      [deploy-dev-Network, deploy-dev-Database, deploy-dev-Cache, deploy-dev-Queue, deploy-dev-API, deploy-dev-WebApp]
    # ... same pattern for staging

Proposed API Design

Option 1: Explicit Stack Configuration

new GithubCDKPipeline(project, {
  stackPrefix: 'MyApp',
  branchName: 'main',
  iamRoleArns: { default: 'arn:aws:iam::...' },

  stages: [
    {
      name: 'dev',
      env: { account: '111111111111', region: 'us-east-1' },

      // Explicit stack configuration with dependencies
      stacks: [
        {
          name: 'Network',
          dependsOn: [], // Deploys first
        },
        {
          name: 'Database',
          dependsOn: ['Network'], // Waits for Network
        },
        {
          name: 'Cache',
          dependsOn: ['Network'],
        },
        {
          name: 'Queue',
          dependsOn: ['Network'],
        },
        {
          name: 'API',
          dependsOn: ['Database', 'Queue'], // Database+Cache+Queue can run in parallel
        },
        {
          name: 'WebApp',
          dependsOn: ['API', 'Cache'],
        },
      ],

      // Optional: fallback behavior if stacks not specified
      // stackDeploymentMode: 'single-job' | 'parallel-jobs'
    },
    {
      name: 'staging',
      env: { account: '222222222222', region: 'us-east-1' },
      stacks: [
        /* same list */
      ],
    },
    {
      name: 'prod',
      env: { account: '333333333333', region: 'us-east-1' },
      stacks: [
        /* same list */
      ],
      manualApproval: true,
    },
  ],
});

Option 2: Auto-Discovery Mode

new GithubCDKPipeline(project, {
  stackPrefix: 'MyApp',
  branchName: 'main',
  iamRoleArns: { default: 'arn:aws:iam::...' },

  // Auto-discover stacks via `cdk list`
  stackDeploymentMode: 'parallel-jobs', // or 'single-job' (current behavior)

  stages: [
    {
      name: 'dev',
      env: { account: '111111111111', region: 'us-east-1' },
      // Auto-discovers: MyApp-dev/Network, MyApp-dev/Database, etc.
      // Deploys all in parallel (no dependency management)
    },
  ],
});

Option 3: Hybrid Approach (Recommended)

new GithubCDKPipeline(project, {
  stackPrefix: 'MyApp',

  stages: [
    {
      name: 'dev',
      env: { account: '111111111111', region: 'us-east-1' },

      // If stacks specified: use explicit config with dependencies
      stacks: [
        { name: 'Network', dependsOn: [] },
        { name: 'Database', dependsOn: ['Network'] },
        // ... etc
      ],
    },
    {
      name: 'staging',
      env: { account: '222222222222', region: 'us-east-1' },

      // If no stacks specified: auto-discover and deploy all in parallel
      stackDeploymentMode: 'parallel-jobs',
    },
    {
      name: 'prod',
      env: { account: '333333333333', region: 'us-east-1' },

      // Or keep current single-job behavior
      stackDeploymentMode: 'single-job', // default
    },
  ],
});

Benefits

  1. Enhanced Visibility: GitHub Actions UI shows each stack as a separate job with individual status, logs, and timing
  2. Faster Troubleshooting: Immediately identify which stack failed without parsing combined logs
  3. True Parallelism: GitHub Actions orchestrates parallel execution with proper dependency management
  4. Granular Control: Different timeouts, retry strategies, or approval gates per stack
  5. Better Monitoring: Integration with GitHub Actions monitoring tools shows per-stack metrics
  6. Dependency Management: Explicitly express stack dependencies (e.g., network layer before application layer)
  7. Selective Retries: Re-run only failed stacks without redeploying successful ones
  8. Cost Optimization: For large applications, better visibility helps identify slow-deploying stacks for optimization

Implementation Considerations

1. Stack Discovery

  • Auto-discovery: Run cdk list during synthesis to discover all stacks
  • Naming convention: Parse stack names to extract stage and stack identifier
  • Caching: Cache stack list to avoid repeated CDK synth operations

2. Dependency Graph

  • Validation: Detect circular dependencies in dependsOn configuration
  • Topological sort: Generate correct deployment order
  • GitHub Actions translation: Convert to needs: arrays in workflow YAML

3. Asset Publishing

  • Asset upload must complete before any stack deployment begins
  • All stack jobs should depend on assetUpload job
  • Existing asset publishing logic remains unchanged

4. Concurrency Groups

  • Each stack needs unique concurrency group: deploy-{stage}-{stack}
  • Prevents simultaneous deployments of same stack
  • cancel-in-progress: false maintains queuing behavior

5. Stage Transitions

  • Next stage starts only after ALL stacks in previous stage complete
  • Generate needs: array containing all stack jobs from previous stage
  • Preserve existing sequential stage behavior

6. Backward Compatibility

  • Default behavior: current single-job mode (no breaking changes)
  • Opt-in: users explicitly configure stacks or stackDeploymentMode
  • Migration path: users can gradually adopt per-stack jobs

7. Independent Stages

  • Apply same stack parallelism to independent stages
  • Each independent stage's stacks can run in parallel
  • No cross-stage dependencies (by definition)

Example Generated Workflow Structure

name: deploy
on:
  push:
    branches: [main]

jobs:
  synth:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Synth CDK
        run: npx projen synth

  assetUpload:
    needs: [synth]
    runs-on: ubuntu-latest
    steps:
      - name: Publish assets
        run: npx projen publish:assets

  # Dev stage - parallel stacks with dependencies
  deploy-dev-Network:
    needs: [assetUpload]
    concurrency:
      group: deploy-dev-Network
      cancel-in-progress: false
    runs-on: ubuntu-latest
    steps:
      - run: npx cdk deploy MyApp-dev/Network

  deploy-dev-Database:
    needs: [assetUpload, deploy-dev-Network]
    concurrency:
      group: deploy-dev-Database
      cancel-in-progress: false
    runs-on: ubuntu-latest
    steps:
      - run: npx cdk deploy MyApp-dev/Database

  deploy-dev-Cache:
    needs: [assetUpload, deploy-dev-Network]
    concurrency:
      group: deploy-dev-Cache
      cancel-in-progress: false
    runs-on: ubuntu-latest
    steps:
      - run: npx cdk deploy MyApp-dev/Cache

  # ... more dev stacks ...

  # Prod stage - waits for ALL dev stacks
  deploy-prod-Network:
    needs: [deploy-dev-Network, deploy-dev-Database, deploy-dev-Cache]
    concurrency:
      group: deploy-prod-Network
      cancel-in-progress: false
    runs-on: ubuntu-latest
    steps:
      - run: npx cdk deploy MyApp-prod/Network

  # ... more prod stacks ...

Phased Implementation Plan

Phase 1: Basic Parallel Deployment (MVP)

  • Add stackDeploymentMode: 'parallel-jobs' option
  • Auto-discover stacks via cdk list
  • Generate separate GitHub Actions job per stack
  • All stacks deploy in parallel (no dependency management)
  • Maintain backward compatibility (default: single-job mode)

Phase 2: Dependency Management

  • Add stacks configuration with dependsOn arrays
  • Implement dependency graph validation (detect cycles)
  • Generate needs: arrays based on dependencies
  • Support mixing explicit config with auto-discovery

Phase 3: Advanced Features

  • Stack-specific configuration (timeouts, retries, approval gates)
  • Selective stack deployment (deploy only changed stacks)
  • Integration with CDK stack dependencies (stack.addDependency())
  • Cost estimation per stack

Related Features

This enhancement complements existing projen-pipelines features:

  • Works with sequential stages (dev → staging → prod)
  • Works with independent stages (parallel to main pipeline)
  • Compatible with GitHub Environments and manual approval gates
  • Preserves asset publishing strategy (global or per-stage)
  • Maintains feature branch deployment capabilities

References

Questions for Maintainers

  1. Preferred API design: Explicit stacks config, auto-discovery, or hybrid approach?
  2. Breaking changes: Any concerns with adding new configuration options?
  3. Platform support: Should this be GitHub Actions only initially, or all platforms?
  4. Stack discovery: Preference for cdk list parsing vs explicit configuration?
  5. Dependency sources: Should we try to read CDK's stack.addDependency() automatically?

Example Use Case: Real-World Application

A typical enterprise CDK application might have:

  • 10-20 stacks per environment
  • 3-4 environments (dev, staging, prod, dr)
  • Complex dependencies (network → data → compute → application layers)
  • 30-45 minute deployment time with current single-job approach
  • Potential reduction to 15-20 minutes with proper parallelization
  • Significantly improved visibility and troubleshooting experience

This enhancement would provide substantial value for teams deploying complex CDK applications at scale.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions