Skip to content

Glue Workers and Worker Type model configuratoion #652

@soham-dasgupta

Description

@soham-dasgupta

Describe the feature

Currently, AWS Glue worker configuration (number of workers, worker type) can only be specified at the profile level in profiles.yml, which applies the same settings to all models in a project. This feature request proposes adding model-level configuration for Glue workers, allowing users to specify worker counts and types per model using dbt's model configuration syntax.

This would enable configuration like:

# dbt_project.yml
models:
  my_project:
    staging:
      +workers: 2
      +worker_type: G.1X
    marts:
      large_aggregations:
        +workers: 10
        +worker_type: G.2X

Describe alternatives you've considered

  1. Multiple profiles - Creating separate profiles for different worker configurations and manually switching between them, but this is cumbersome and doesn't scale well with many models requiring different settings.

  2. Separate dbt projects - Splitting models into different projects based on compute needs, but this breaks dependency management and creates maintenance overhead.

  3. Pre/post-hooks with AWS API calls - Manually adjusting Glue job parameters via hooks, but this is hacky, harder to maintain, and bypasses dbt's configuration system.

  4. Manual job configuration - Running models separately with different profile configurations, which defeats the purpose of orchestrated dbt runs.

None of these alternatives provide the flexibility and maintainability that native model-level configuration would offer.

Additional context

Use Cases:

  1. Staging models: Lightweight transformations that need only 2-3 workers
  2. Incremental models: Medium-sized operations requiring 5-7 workers
  3. Heavy aggregations: Complex transformations or large datasets requiring 10+ workers
  4. Development vs Production: Different worker counts for different targets

Expected Behavior:

  1. Model-level configuration should override profile-level defaults
  2. Should support both workers and worker_type parameters
  3. Should respect dbt's configuration hierarchy (model config > project config > profile config)

Example scenario:
A project with 50 models where 40 are simple transformations (2 workers), 8 are moderate (5 workers), and 2 are heavy aggregations (15 workers). Currently, users must either over-provision all models or under-provision the heavy ones.

Cost Impact:
Proper worker allocation could reduce costs by 40-60% by avoiding over-provisioning while maintaining performance where needed.

Who will this benefit?

  1. Data engineers managing diverse workloads with varying compute requirements
  2. Organizations looking to optimize AWS Glue costs without sacrificing performance
  3. Teams running dbt at scale with models of varying complexity

This would particularly benefit medium to large organizations running hundreds of dbt models on AWS Glue with heterogeneous workload characteristics.

Are you interested in contributing this feature?

Yes. I am ready to contribute.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions