balloons: add cpuClasses#667
Conversation
f4f281e to
dce58c5
Compare
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
2695589 to
e919069
Compare
There was a problem hiding this comment.
Pull request overview
Adds a top-level cpuClasses configuration model for the balloons policy, including human-readable/symbolic frequency parsing, turbo-priority allocation, CRD/docs updates, and e2e coverage for turbo behavior and legacy CPU class syntax.
Changes:
- Introduces
CPUClass/FrequencyAPI types, CRD schema updates, docs, and config template migration tocpuClasses. - Adds a balloons CPU class turbo allocator that resolves symbolic frequencies and coordinates with the CPU controller.
- Updates CPU controller/sysfs/test support for dynamic classes, cpufreq overrides, write deduplication, and turbo-priority e2e validation.
Reviewed changes
Copilot reviewed 19 out of 21 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
cmd/plugins/balloons/policy/balloons-policy.go |
Wires turbo allocator into balloons setup, reset, assignment, validation, and reconfiguration. |
cmd/plugins/balloons/policy/cpuclass.go |
Adds turbo-aware CPU class allocator and symbolic frequency resolution. |
cmd/plugins/balloons/policy/flags.go |
Adds aliases for new CPU class/frequency config types. |
config/crd/bases/config.nri_balloonspolicies.yaml |
Adds cpuClasses to generated CRD schema. |
deployment/helm/balloons/crds/config.nri_balloonspolicies.yaml |
Adds Helm-packaged CRD schema for cpuClasses. |
docs/resource-policy/policy/balloons.md |
Documents preferred cpuClasses, symbolic units, turbo priority, and legacy syntax. |
pkg/apis/config/v1alpha1/balloons-policy.go |
Injects top-level cpuClasses into common CPU controller config. |
pkg/apis/config/v1alpha1/resmgr/policy/balloons/config.go |
Adds CPUClasses to balloons policy config. |
pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go |
Adds deepcopy support for balloons CPUClasses. |
pkg/apis/config/v1alpha1/resmgr/policy/cpuclass.go |
Defines user-facing CPU class fields. |
pkg/apis/config/v1alpha1/resmgr/policy/frequency.go |
Adds frequency parsing, JSON marshal/unmarshal, symbolic values, and resolution helpers. |
pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go |
Adds deepcopy support for CPUClass. |
pkg/resmgr/control/cpu/api.go |
Adds dynamic SetClass and defers enforcement logging before controller start. |
pkg/resmgr/control/cpu/cache.go |
Downgrades missing assignment cache log on fresh startup. |
pkg/resmgr/control/cpu/cpu.go |
Adds per-CPU cpufreq write cache and merges dynamic/static CPU class definitions. |
pkg/sysfs/system.go |
Adds test-oriented cpufreq sysfs override support. |
test/e2e/policies.test-suite/balloons/balloons-config.yaml.in |
Migrates default balloons test config to top-level cpuClasses. |
test/e2e/policies.test-suite/balloons/n4c16/test17-cstates-scheduling/balloons-cstates.cfg |
Converts C-state class config to cpuClasses. |
test/e2e/policies.test-suite/balloons/n4c16/test18-turbo-priority/balloons-turbo.cfg |
Adds turbo-priority e2e config. |
test/e2e/policies.test-suite/balloons/n4c16/test18-turbo-priority/balloons-turbo-oldsyntax.cfg |
Adds legacy control.cpu.classes compatibility e2e config. |
test/e2e/policies.test-suite/balloons/n4c16/test18-turbo-priority/code.var.sh |
Adds turbo-priority and cpufreq write-minimality e2e flow. |
Files not reviewed (2)
- pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go: Language not supported
- pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go: Language not supported
Comments suppressed due to low confidence (3)
cmd/plugins/balloons/policy/balloons-policy.go:1445
- When a live update changes only
idleCPUClass, this branch detects a CPU-class-only change but never copiesnewBalloonsOptions.IdleCpuClassintop.bpoptions. The allocator is reconfigured with the old idle class andresetCpuClass()continues to apply the old value, so idle class changes are ignored until a full policy reconfiguration occurs.
// Update CPUClasses definitions.
p.bpoptions.CPUClasses = newBalloonsOptions.CPUClasses
if p.turboAllocator != nil {
if err := p.turboAllocator.Reconfigure(p.bpoptions.CPUClasses, p.bpoptions.IdleCpuClass); err != nil {
cmd/plugins/balloons/policy/balloons-policy.go:1713
- The turbo allocator is created/reconfigured before
fillBuiltinBalloonDefs()andvalidateConfig()run. If validation fails, this has already mutated policy/controller state viap.turboAllocatorandcpucontrol.SetClass, so an invalid configuration update can leave partially applied CPU class definitions behind despitesetConfig()returning an error.
if p.turboAllocator == nil {
ta, err := NewCPUClassTurboAllocator(
WithSystem(p.options.System),
WithCache(p.cch),
WithCPUClasses(bpoptions.CPUClasses),
WithIdleClass(bpoptions.IdleCpuClass),
)
if err != nil {
return balloonsError("failed to create CPU class turbo allocator: %w", err)
}
p.turboAllocator = ta
} else {
if err := p.turboAllocator.Reconfigure(bpoptions.CPUClasses, bpoptions.IdleCpuClass); err != nil {
return balloonsError("failed to reconfigure CPU class turbo allocator: %w", err)
}
}
cmd/plugins/balloons/policy/cpuclass.go:199
- Idle CPUs are assigned once but are not tracked or reassigned when the turbo winner changes. If
idleCPUClassuses symbolicturbo, idle CPUs keep the effective value from the last reset/release (for example turbo from startup) even after a higher-priority active class should cap non-winners to base.
// ResetIdle assigns the given CPU set to the idle class via the CPU
// controller. Used at policy startup to bring all allowed CPUs to a
// known baseline before any container-driven UseClass call. Does not
// affect the active-class tracking.
func (a *CPUClassTurboAllocator) ResetIdle(cpus cpuset.CPUSet) error {
if cpus.IsEmpty() {
return nil
}
if err := cpucontrol.Assign(a.cch, a.idleClassName, cpus.UnsortedList()...); err != nil {
return fmt.Errorf("failed to assign CPUs %s to idle class %q: %w", cpus, a.idleClassName, err)
}
return nil
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // kernel sysfs cpufreq interface). Symbolic values ("min", "base", | ||
| // "turbo") are stored as sentinel constants and must be resolved with | ||
| // Resolve() before being passed to the CPU controller. | ||
| // +kubebuilder:validation:Type=string |
| // Update the cache even on failure: the desired value | ||
| // is unchanged so retrying on every Assign would just | ||
| // spam logs without ever succeeding. A subsequent | ||
| // configure() resets lastFreq so a real configuration | ||
| // change still triggers a fresh attempt. | ||
| state.min = min | ||
| state.hasMin = true |
| // Detect changes in CPUClasses definitions (turbo attributes, frequencies, etc.) | ||
| if len(opts0.CPUClasses) != len(opts1.CPUClasses) { | ||
| return true | ||
| } | ||
| if utils.DumpJSON(opts0.CPUClasses) != utils.DumpJSON(opts1.CPUClasses) { | ||
| return true |
| a.classByName = make(map[string]*CPUClass, len(classes)) | ||
| for _, cc := range classes { | ||
| a.classByName[cc.Name] = cc | ||
| } |
| } | ||
| freq := cpu.FrequencyRange() | ||
| baseFreq := cpu.BaseFrequency() | ||
| if baseFreq == 0 || freq.Max == 0 { |
|
There is some technical and architectural debt that I wish to pay still in this PR. That is, the CPU controller should not directly modify frequencies, but this should be via cache and aligned with the spirit of applying "pending updates". Unfortunately controller hooks are container-specific and possibly called multiple times while handling single NRI event, whereas all CPU properties should be written once per NRI event. I'll add yet another hook to the Controller interface to commit whatever changes a controller has stored since the previous Commit(). |
e919069 to
d61da00
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 22 out of 24 changed files in this pull request and generated 5 comments.
Files not reviewed (2)
- pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go: Language not supported
- pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go: Language not supported
| // Just print an error. A config update later on may be valid. | ||
| log.Errorf("failed apply /cpuinitial configuration: %v", err) | ||
| } | ||
|
|
||
| ctl.started = true | ||
|
|
||
| return true, nil |
| // controller. The recalculation runs first so that the controller's | ||
| // in-memory class definition reflects the correct effective turbo | ||
| // frequency at the time of Assign. | ||
| func (a *CPUClassTurboAllocator) UseClass(className string, cpus cpuset.CPUSet) error { | ||
| if cpus.IsEmpty() { | ||
| return nil | ||
| } | ||
| a.removeCpusFromAllClasses(cpus) | ||
| if className != "" { | ||
| a.activeCpus[className] = a.activeCpus[className].Union(cpus) | ||
| } | ||
| a.recalculateTurbo() | ||
| if err := cpucontrol.Assign(a.cch, className, cpus.UnsortedList()...); err != nil { | ||
| return fmt.Errorf("failed to assign CPUs %s to class %q: %w", cpus, className, err) |
| // Verify that cpuClass references in balloon types are defined | ||
| // in either cpuClasses or existing control.cpu.classes. | ||
| existingControlClasses := cpucontrol.GetClasses() | ||
| for _, blnDef := range bpoptions.BalloonDefs { |
| // Set bpoptions early so the turbo allocator construction below | ||
| // has access to CPUClasses. | ||
| p.bpoptions = bpoptions | ||
|
|
| - `minFreq` (string or number): Minimum CPU frequency. Accepts values | ||
| with units: `"3.2GHz"`, `"2900MHz"`, `"2900000kHz"`, or a plain | ||
| number in kHz. Also accepts symbolic names: `"min"` (platform | ||
| minimum), `"base"` (CPU base frequency), `"turbo"` (maximum turbo | ||
| frequency), which are resolved at runtime from sysfs. | ||
| - `maxFreq` (string or number): Maximum CPU frequency (same format). | ||
| - `uncoreMinFreq` / `uncoreMaxFreq` (string or number): Uncore | ||
| frequency limits (same format). |
d61da00 to
15fdc7d
Compare
…class definitions Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Introduce CPUClassTurboAllocator that owns CPU-class state and all cpucontrol.Assign / cpucontrol.SetClass calls, keeping CPU-class concerns out of the rest of the policy. This change introduces very simple allocator that is unaware of zones (sockets, dies) or CPU core counts affected by turbo on different platforms. Smarter allocator is future work. Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
15fdc7d to
22769db
Compare
Add new configuration section
cpuClassesto offer a user-friendly descriptive front-end on top of direct sysfs cpufreq controls incontrol.cpu.classes.Reasons:
cpuClassessection better aligned withschedulingClassesandloadClassesalready available on the top level. It also uses the same class notation as these and balloonTypes, that is, a list of objects with "name" attribute specifying a class name, rather than using a key as a name like incontrol.cpu.classes.turboPriorityvalue get to share all turbo budget.The purpose is to lay down the framework for dynamic turbo (and possibly other feature) management, and add only a very simple turbo allocator at this point. The allocator can be made smarter in the future, for instance, by making it aware of topology zones affected by some CPUs running on turbo frequencies, different turbo frequency levels, and the number of CPUs that can hold turbo frequencies on different platforms, and heterogeneous cores (P/E/LPE).