Add StatefulSet workload support to CRUD benchmarking framework#1132
Open
diamondpowell wants to merge 20 commits into
Open
Add StatefulSet workload support to CRUD benchmarking framework#1132diamondpowell wants to merge 20 commits into
diamondpowell wants to merge 20 commits into
Conversation
8a14575 to
1207d1d
Compare
695cd4e to
61a2300
Compare
7ea865e to
dea3f3f
Compare
Add create_statefulset() to NodePoolCRUD that deploys K8s StatefulSets onto node pools after provisioning. Follows the same pattern as create_deployment — multi-doc YAML manifest parsing, configurable replica count, and per-statefulset readiness validation via wait_for_condition. - Add 'statefulset' subcommand to handle_workload_operations() in main.py with --number-of-statefulsets and --replicas args - Add statefulset.yml workload template with configurable replicas and node affinity via label_selector - Add _is_statefulset_ready and _check_statefulset_condition to kubernetes_client.py for readiness polling
Add statefulset execution step to the k8s CRUD engine pipeline between deployment and scale-down. Parameters (number_of_statefulsets, replicas) flow from pipeline matrix → topology → engine step → main.py. - Add statefulset script block to steps/engine/crud/k8s/execute.yml - Pass number_of_statefulsets through topology execute-crud.yml
Add test coverage for create_statefulset and statefulset wait_for_condition: - test_create_statefulset_success: single statefulset with readiness check - test_create_statefulset_failure: statefulset fails to become ready - test_create_statefulset_partial_success: continues on individual failures - test_create_statefulset_no_client: returns early when k8s client unavailable - test_statefulset_wait_for_condition: validates _is_statefulset_ready and _check_statefulset_condition polling logic
- Extract _apply_statefulset helper (matches _apply_deployment pattern) - Use os.path for default template path instead of hardcoded string - Use per-statefulset labels to avoid selector collision - Remove redundant outer try/except - Use workload_common_parser for shared args (--count, --replicas, etc.) - Add hasattr guard for cloud provider compatibility - Use args.count instead of args.number_of_statefulsets - Update pipeline YAML to use count parameter
…-dir
- Wrap statefulset pipeline step inside Azure cloud gate (matches deployment)
- Use ${MANIFEST_DIR:+--manifest-dir} conditional (matches deployment pattern)
77ed08a to
2db2b31
Compare
The k8s-gpu-cluster-crud scenario already has terraform inputs. Custom scenario dirs will be reverted before merge.
- Add gpu_node_pool: '' to pipeline test matrix to prevent GPU driver
installation on non-GPU VM (Standard_D4s_v3)
- Include workload type in pod labels to prevent collision when both
deployment and statefulset run in the same pipeline:
- deployment: nginx-container-deployment-{index}
- statefulset: nginx-container-statefulset-{index}
Previously both used nginx-container-{index}, causing
wait_for_pods_ready to see 20 pods when expecting 10
Contributor
There was a problem hiding this comment.
Pull request overview
Adds Kubernetes StatefulSet workload support to the CRUD benchmarking framework so Telescope can measure StatefulSet create/verify latency (AKS-focused), alongside the existing Deployment workload path.
Changes:
- Introduces a parameterized StatefulSet + headless Service manifest template and a new
statefulsetCLI subcommand/dispatcher path. - Adds AKS NodePoolCRUD orchestration for creating/verifying multiple StatefulSets (including per-workload label isolation to avoid selector collisions).
- Extends the Kubernetes client’s
wait_for_conditionto support StatefulSet readiness checks and adds unit tests for the new behavior.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| steps/engine/crud/k8s/execute.yml | Adds an Azure-gated pipeline step to run StatefulSet workload operations via main.py statefulset. |
| modules/python/crud/workload_templates/statefulset.yml | New templated StatefulSet + headless Service manifest used by the CRUD workload runner. |
| modules/python/crud/main.py | Adds statefulset subcommand and routes it through the shared workload dispatcher and args. |
| modules/python/crud/azure/node_pool_crud.py | Implements StatefulSet apply/verify loop (create_statefulset + helper) and updates deployment labels to avoid cross-workload collisions. |
| modules/python/clients/kubernetes_client.py | Adds StatefulSet readiness polling support to wait_for_condition. |
| modules/python/tests/crud/test_azure_node_pool_crud.py | Adds unit tests covering StatefulSet creation success/failure/partial success and no-client behavior. |
| modules/python/tests/clients/test_kubernetes_client.py | Adds unit tests for StatefulSet readiness wait success/timeout/not-found cases. |
xinWeiWei24
reviewed
May 18, 2026
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
ff2b726 to
6d0a8d9
Compare
6d0a8d9 to
f84cf84
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds StatefulSet workload support to the CRUD benchmarking framework — the second of three planned workload methods (
deployment,statefulset,jobs). Measures K8s StatefulSet create/verify latency on AKS node pools.Branch cleanup note: Rebased and squashed for reviewability. All commits are logically grouped. Also includes a fix for
gpu_profiledriver setting that caused node pool creation failure on non-GPU pools.Changes
modules/python/crud/workload_templates/statefulset.ymlclusterIP: None) for stable pod DNSSTATEFULSET_REPLICASplaceholder, configurable node affinity via label_selectormodules/python/crud/azure/node_pool_crud.pycreate_statefulset()— same loop pattern ascreate_deploymentreadycondition (notavailable) since StatefulSets don't support theavailablecondition typemodules/python/crud/main.pystatefulsetsubparser with--node-pool-name,--number-of-statefulsets,--replicas,--manifest-direlif command == "statefulset"routing inhandle_workload_operationsmodules/python/clients/kubernetes_client.py_is_statefulset_readyand_check_statefulset_conditionfor readiness pollingwait_for_conditionto support StatefulSet resource typesteps/engine/crud/k8s/execute.ymlstatefulsetscript block callingpython3 main.py statefulsetsteps/topology/k8s-crud-gpu/execute-crud.ymlnumber_of_statefulsetsthrough to engine stepmodules/python/clients/aks_client.pygpu_profiledriver to"None"for non-GPU node pools (was incorrectly set to"Install", causing creation failures)Changes since initial PR
modules/python/crud/azure/node_pool_crud.pyWORKLOAD_CONFIGdict — centralizes workload-specific settings (template path, resource kind, replicas placeholder, condition type) for deployment/statefulset_apply_workload()unified helper — single implementation handles both deployment and statefulset (template loading, placeholder replacement, apply, verify)_create_workloads()unified helper — orchestration loop with error handling, calls_apply_workload()for each instancecreate_deployment()andcreate_statefulset()are now thin wrappers that call_create_workloads()with the workload typecreate_deployment,create_statefulset), then helpers (_create_workloads,_apply_workload)nginx-container-deployment-1vsnginx-container-statefulset-1modules/python/crud/main.py(deployment, statefulset)workload_common_parserfor shared args (--count,--replicas,--manifest-dir,--label-selector)steps/engine/crud/k8s/execute.yml${{ if eq(parameters.cloud, 'azure') }}:— AWSnode_pool_crud.pydoesn't have workload methods yet--manifest-dirpassed conditionally:${MANIFEST_DIR:+--manifest-dir "$MANIFEST_DIR"}— avoids passing empty string when not setTests
test_azure_node_pool_crud.py:test_create_statefulset_success— happy pathtest_create_statefulset_failure— all fail to become readytest_create_statefulset_no_client— returns early when k8s client unavailabletest_create_statefulset_partial_success— continues on failures, returns Falsetest_kubernetes_client.py:test_wait_for_condition_statefulset_successtest_wait_for_condition_statefulset_timeouttest_wait_for_condition_statefulset_not_found