feat(crud-machine-ci): wire create-machine/scale-machine into ADO pipeline#1194
Closed
karenychen wants to merge 3 commits into
Closed
feat(crud-machine-ci): wire create-machine/scale-machine into ADO pipeline#1194karenychen wants to merge 3 commits into
karenychen wants to merge 3 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR wires the AKS Machine API CLI path (create-machine / scale-machine) into Telescope’s Azure DevOps perf-eval CI by adding a dedicated scenario, topology templates, a new CRUD engine step, and a scheduled pipeline definition.
Changes:
- Adds a new perf-eval scenario (
k8s-cluster-crud-machine) that provisions an AKS cluster shell via Terraform withdefault_node_pool = null, relying on runtime-injected system pool config. - Introduces a new topology (
k8s-crud-machine) and a new CRUD engine step (execute-machine.yml) that runs create and scale as two discrete pipeline steps. - Adds a scheduled ADO pipeline with small/medium/large matrix entries and batch API gating.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| scenarios/perf-eval/k8s-cluster-crud-machine/terraform-inputs/azure.tfvars | Defines the new scenario’s Terraform inputs, including default_node_pool = null rationale. |
| scenarios/perf-eval/k8s-cluster-crud-machine/terraform-test-inputs/azure.json | Provides example JSON inputs for Terraform tests (system pool injection). |
| steps/engine/crud/k8s/execute-machine.yml | New engine step to run create-machine and scale-machine as separate observable steps. |
| steps/topology/k8s-crud-machine/validate-resources.yml | Topology validation step wiring kubeconfig for the scenario. |
| steps/topology/k8s-crud-machine/execute-crud.yml | Topology execution step that calls the new execute-machine engine template. |
| steps/topology/k8s-crud-machine/collect-crud.yml | Topology collection step wiring into the existing CRUD collect template. |
| pipelines/perf-eval/AKS Machine API Benchmark/k8s-cluster-crud-machine.yml | New scheduled pipeline and matrix for the Machine API CRUD benchmark. |
| --machine-workers "$MACHINE_WORKERS" \ | ||
| --readiness-wait-timeout "$READINESS_WAIT_TIMEOUT" \ | ||
| --step-timeout "$STEP_TIME_OUT" \ | ||
| $([ "${USE_BATCH_API,,}" = "true" ] && echo "--use-batch-api") |
b67130d to
8f5a5ae
Compare
…eline Add scenario, topology, engine step, and pipeline file to exercise the AKS Machine API CLI (`create-machine` and `scale-machine`, shipped in PR #1187) end-to-end in the Telescope CI. Design choices (Option 2 hybrid, intentional divergence from ado-telescope): * Reuse the existing `crud` engine instead of introducing a new `raw` engine path; mirror PR #1187's separate subcommands so create and scale are two discrete observable pipeline steps. * New scenario `scenarios/perf-eval/k8s-cluster-crud-machine/` with `default_node_pool = null` (the Machines-mode pool is born via the ARM PUT performed by `create-machine`). * System pool is injected at job time via the existing `SYSTEM_NODE_POOL` plumbing in `steps/terraform/set-input-variables-azure.yml`. * New topology folder `steps/topology/k8s-crud-machine/` mirroring the `k8s-crud-gpu` shape (validate-resources / execute-crud / collect-crud). * New sibling engine step `steps/engine/crud/k8s/execute-machine.yml` that does NOT modify the existing `execute.yml`. * Teardown via terraform destroy only (no CLI delete step today). * Region: westus2. * `use_batch_api` is gated at the bash level (matching the ado-telescope idiom) because the matrix value is a runtime variable and ADO template conditionals run at compile time.
8f5a5ae to
c148bbf
Compare
Contributor
Author
|
Closing this iteration so we can rework the pipeline approach and try again. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wires the AKS Machine API CLI (
create-machineandscale-machine, shipped in PR #1187) into Telescope CI end-to-end: a new scenario, topology, engine step, and pipeline file.Builds directly on current
main— no Python or Terraform module changes.Design (Option 2 hybrid)
Intentional divergence from the internal
ado-telescopemirror to improve pipeline observability:crudengine instead of therawengine. Mirrors PR feat(crud-machine): AKS Machine API client + CRUD layer + CLI #1187's separatecreate-machine/scale-machinesubcommands so create and scale are two discrete observable pipeline steps rather than one Python process branching on--resource_type.scenarios/perf-eval/k8s-cluster-crud-machine/withdefault_node_pool = null. The Machines-mode pool is born via the ARM PUT performed insidecreate-machine; Terraform doesn't pretend to own it.SYSTEM_NODE_POOLplumbing insteps/terraform/set-input-variables-azure.yml— no Terraform module changes.steps/topology/k8s-crud-machine/mirroring thek8s-crud-gpushape (validate-resources/execute-crud/collect-crud).steps/engine/crud/k8s/execute-machine.yml. Does not modify the existingexecute.yml.terraform destroyonly (no CLI delete subcommand exists today).canadacentral.use_batch_apigated at bash level (matching the ado-telescope idiom) because the matrix value is a runtime variable and ADO template conditionals run at compile time.Files
Added (7 files):
Not modified:
modules/python/crud/,modules/terraform/azure/aks-cli/,jobs/competitive-test.yml,steps/engine/crud/k8s/execute.yml,steps/engine/crud/k8s/collect.yml,steps/topology/k8s-crud-gpu/.Pipeline matrix
Three entries (small/medium/large) varying pool size (1/50/100 machines), step timeout (600/600/900s), and
use_batch_api(false/false/true). Daily cron at 12:00 UTC incanadacentral.max_parallel: 1.Verification
yamllint -c .yamllint . --no-warningsterraform fmt --check --diff scenarios/perf-eval/k8s-cluster-crud-machine/terraform-inputs/azure.tfvarspython3YAML parse for all new YAML files and JSON parse forterraform-test-inputs/azure.jsongit diff --checkCI on this PR will validate the same.
Follow-ups
k8s-cluster-crud-machine-5k.yml)delete-machineCLI subcommand if the platform team wants explicit teardown observability