Goal
Every overlay shipped by AICR must carry a performance-phase constraint anchored in empirically-grounded thresholds from a real testbed, so aicr validate ships a meaningful runtime gate instead of a placeholder. Overlays without a constraint either get one or are explicitly exempted in recipes/overlays_validation_floor_test.go with the tracking issue.
Parent initiative: #1041.
Success criteria
- Every entry in
recipes/overlays_validation_floor_test.go flagged by AICR_VALIDATION_FLOOR_STRICT=1 either has the required constraint or carries an exemption pointing at the issue tracking the testbed.
- Each added constraint uses thresholds derived from a real run on the corresponding service (no placeholder values).
- The validator the constraint invokes (e.g.,
inference-perf, nccl-all-reduce-bw) actually executes against the workload variant the overlay deploys — if it doesn't, the gap is filed under the validator capability Epic instead.
Scope
In scope:
- Adding
nccl-* / inference-perf / future performance constraints to existing strict-floor-flagged overlays once their testbed lands.
- Re-running the threshold derivation when a service or accelerator generation changes.
Out of scope (other epics under #1041):
- New overlays themselves (covered under the overlay-coverage Epic).
- Validator changes that would make an existing skip path actually run on a NIM / new workload variant (validator capability Epic).
New testbed-blocked performance gaps should be filed as standalone issues and attached here.
Goal
Every overlay shipped by AICR must carry a performance-phase constraint anchored in empirically-grounded thresholds from a real testbed, so
aicr validateships a meaningful runtime gate instead of a placeholder. Overlays without a constraint either get one or are explicitly exempted inrecipes/overlays_validation_floor_test.gowith the tracking issue.Parent initiative: #1041.
Success criteria
recipes/overlays_validation_floor_test.goflagged byAICR_VALIDATION_FLOOR_STRICT=1either has the required constraint or carries an exemption pointing at the issue tracking the testbed.inference-perf,nccl-all-reduce-bw) actually executes against the workload variant the overlay deploys — if it doesn't, the gap is filed under the validator capability Epic instead.Scope
In scope:
nccl-*/inference-perf/ future performance constraints to existing strict-floor-flagged overlays once their testbed lands.Out of scope (other epics under #1041):
New testbed-blocked performance gaps should be filed as standalone issues and attached here.