Describe the feature
Currently, AWS Glue worker configuration (number of workers, worker type) can only be specified at the profile level in profiles.yml, which applies the same settings to all models in a project. This feature request proposes adding model-level configuration for Glue workers, allowing users to specify worker counts and types per model using dbt's model configuration syntax.
This would enable configuration like:
# dbt_project.yml
models:
my_project:
staging:
+workers: 2
+worker_type: G.1X
marts:
large_aggregations:
+workers: 10
+worker_type: G.2X
Describe alternatives you've considered
-
Multiple profiles - Creating separate profiles for different worker configurations and manually switching between them, but this is cumbersome and doesn't scale well with many models requiring different settings.
-
Separate dbt projects - Splitting models into different projects based on compute needs, but this breaks dependency management and creates maintenance overhead.
-
Pre/post-hooks with AWS API calls - Manually adjusting Glue job parameters via hooks, but this is hacky, harder to maintain, and bypasses dbt's configuration system.
-
Manual job configuration - Running models separately with different profile configurations, which defeats the purpose of orchestrated dbt runs.
None of these alternatives provide the flexibility and maintainability that native model-level configuration would offer.
Additional context
Use Cases:
- Staging models: Lightweight transformations that need only 2-3 workers
- Incremental models: Medium-sized operations requiring 5-7 workers
- Heavy aggregations: Complex transformations or large datasets requiring 10+ workers
- Development vs Production: Different worker counts for different targets
Expected Behavior:
- Model-level configuration should override profile-level defaults
- Should support both workers and worker_type parameters
- Should respect dbt's configuration hierarchy (model config > project config > profile config)
Example scenario:
A project with 50 models where 40 are simple transformations (2 workers), 8 are moderate (5 workers), and 2 are heavy aggregations (15 workers). Currently, users must either over-provision all models or under-provision the heavy ones.
Cost Impact:
Proper worker allocation could reduce costs by 40-60% by avoiding over-provisioning while maintaining performance where needed.
Who will this benefit?
- Data engineers managing diverse workloads with varying compute requirements
- Organizations looking to optimize AWS Glue costs without sacrificing performance
- Teams running dbt at scale with models of varying complexity
This would particularly benefit medium to large organizations running hundreds of dbt models on AWS Glue with heterogeneous workload characteristics.
Are you interested in contributing this feature?
Yes. I am ready to contribute.
Describe the feature
Currently, AWS Glue worker configuration (number of workers, worker type) can only be specified at the profile level in profiles.yml, which applies the same settings to all models in a project. This feature request proposes adding model-level configuration for Glue workers, allowing users to specify worker counts and types per model using dbt's model configuration syntax.
This would enable configuration like:
Describe alternatives you've considered
Multiple profiles - Creating separate profiles for different worker configurations and manually switching between them, but this is cumbersome and doesn't scale well with many models requiring different settings.
Separate dbt projects - Splitting models into different projects based on compute needs, but this breaks dependency management and creates maintenance overhead.
Pre/post-hooks with AWS API calls - Manually adjusting Glue job parameters via hooks, but this is hacky, harder to maintain, and bypasses dbt's configuration system.
Manual job configuration - Running models separately with different profile configurations, which defeats the purpose of orchestrated dbt runs.
None of these alternatives provide the flexibility and maintainability that native model-level configuration would offer.
Additional context
Use Cases:
Expected Behavior:
Example scenario:
A project with 50 models where 40 are simple transformations (2 workers), 8 are moderate (5 workers), and 2 are heavy aggregations (15 workers). Currently, users must either over-provision all models or under-provision the heavy ones.
Cost Impact:
Proper worker allocation could reduce costs by 40-60% by avoiding over-provisioning while maintaining performance where needed.
Who will this benefit?
This would particularly benefit medium to large organizations running hundreds of dbt models on AWS Glue with heterogeneous workload characteristics.
Are you interested in contributing this feature?
Yes. I am ready to contribute.