Skip to content

balloons: add cpuClasses#667

Draft
askervin wants to merge 6 commits into
containers:mainfrom
askervin:5h1-balloons-cpuclass
Draft

balloons: add cpuClasses#667
askervin wants to merge 6 commits into
containers:mainfrom
askervin:5h1-balloons-cpuclass

Conversation

@askervin
Copy link
Copy Markdown
Collaborator

@askervin askervin commented May 13, 2026

Add new configuration section cpuClasses to offer a user-friendly descriptive front-end on top of direct sysfs cpufreq controls in control.cpu.classes.

Reasons:

  • Improved coherence. Top-level cpuClasses section better aligned with schedulingClasses and loadClasses already available on the top level. It also uses the same class notation as these and balloonTypes, that is, a list of objects with "name" attribute specifying a class name, rather than using a key as a name like in control.cpu.classes.
  • Human-readable units. Allow configuring CPU frequencies using formats like like 3900MHz, 3.9GHz, in addition to specifying frequencies in integers [kHz] that are directly written to sysfs cpufreq files.
  • Platform-independent, runtime resolved symbolic frequencies. Support symbols "min" (minimum frequency), "base" (base frequency) and "turbo" (max turbo frequency).
  • Architectural support for dynamic CPU attributes with system-wide perspective. So far CPU class adjustments were static and not affected by which other CPU classes were in use. This change introduces new "turbo allocator" layer that sits between balloons (cpusets) and CPU controller. It has information on all CPU classes in use and their symbolic configurations, enabling it to control frequencies based on CPU priorities and platform properties, for instance.
  • Dynamic frequency adjustments. The initial version of the "CPU class turbo allocator" controls which CPU classes are allowed to use "turbo budget" on the host at each point of time. Example: if there are no performance critical real-time containers running, any CPU used by any container gets turbo frequencies. But if there are such critical containers running, turbo frequencies are reserved for their CPUs only by capping maximum frequency of other CPUs to the base frequency. Capping is effective only on symbolic frequencies. Explicit frequency values are respected as is. CPU classes with containers that have the highest turboPriority value get to share all turbo budget.

The purpose is to lay down the framework for dynamic turbo (and possibly other feature) management, and add only a very simple turbo allocator at this point. The allocator can be made smarter in the future, for instance, by making it aware of topology zones affected by some CPUs running on turbo frequencies, different turbo frequency levels, and the number of CPUs that can hold turbo frequencies on different platforms, and heterogeneous cores (P/E/LPE).

@askervin askervin force-pushed the 5h1-balloons-cpuclass branch from f4f281e to dce58c5 Compare May 15, 2026 07:06
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
@askervin askervin force-pushed the 5h1-balloons-cpuclass branch 2 times, most recently from 2695589 to e919069 Compare May 15, 2026 11:06
@askervin askervin marked this pull request as ready for review May 15, 2026 11:31
@askervin askervin changed the title WIP: balloons: add cpuClasses balloons: add cpuClasses May 15, 2026
@askervin askervin requested a review from Copilot May 15, 2026 11:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a top-level cpuClasses configuration model for the balloons policy, including human-readable/symbolic frequency parsing, turbo-priority allocation, CRD/docs updates, and e2e coverage for turbo behavior and legacy CPU class syntax.

Changes:

  • Introduces CPUClass/Frequency API types, CRD schema updates, docs, and config template migration to cpuClasses.
  • Adds a balloons CPU class turbo allocator that resolves symbolic frequencies and coordinates with the CPU controller.
  • Updates CPU controller/sysfs/test support for dynamic classes, cpufreq overrides, write deduplication, and turbo-priority e2e validation.

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
cmd/plugins/balloons/policy/balloons-policy.go Wires turbo allocator into balloons setup, reset, assignment, validation, and reconfiguration.
cmd/plugins/balloons/policy/cpuclass.go Adds turbo-aware CPU class allocator and symbolic frequency resolution.
cmd/plugins/balloons/policy/flags.go Adds aliases for new CPU class/frequency config types.
config/crd/bases/config.nri_balloonspolicies.yaml Adds cpuClasses to generated CRD schema.
deployment/helm/balloons/crds/config.nri_balloonspolicies.yaml Adds Helm-packaged CRD schema for cpuClasses.
docs/resource-policy/policy/balloons.md Documents preferred cpuClasses, symbolic units, turbo priority, and legacy syntax.
pkg/apis/config/v1alpha1/balloons-policy.go Injects top-level cpuClasses into common CPU controller config.
pkg/apis/config/v1alpha1/resmgr/policy/balloons/config.go Adds CPUClasses to balloons policy config.
pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go Adds deepcopy support for balloons CPUClasses.
pkg/apis/config/v1alpha1/resmgr/policy/cpuclass.go Defines user-facing CPU class fields.
pkg/apis/config/v1alpha1/resmgr/policy/frequency.go Adds frequency parsing, JSON marshal/unmarshal, symbolic values, and resolution helpers.
pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go Adds deepcopy support for CPUClass.
pkg/resmgr/control/cpu/api.go Adds dynamic SetClass and defers enforcement logging before controller start.
pkg/resmgr/control/cpu/cache.go Downgrades missing assignment cache log on fresh startup.
pkg/resmgr/control/cpu/cpu.go Adds per-CPU cpufreq write cache and merges dynamic/static CPU class definitions.
pkg/sysfs/system.go Adds test-oriented cpufreq sysfs override support.
test/e2e/policies.test-suite/balloons/balloons-config.yaml.in Migrates default balloons test config to top-level cpuClasses.
test/e2e/policies.test-suite/balloons/n4c16/test17-cstates-scheduling/balloons-cstates.cfg Converts C-state class config to cpuClasses.
test/e2e/policies.test-suite/balloons/n4c16/test18-turbo-priority/balloons-turbo.cfg Adds turbo-priority e2e config.
test/e2e/policies.test-suite/balloons/n4c16/test18-turbo-priority/balloons-turbo-oldsyntax.cfg Adds legacy control.cpu.classes compatibility e2e config.
test/e2e/policies.test-suite/balloons/n4c16/test18-turbo-priority/code.var.sh Adds turbo-priority and cpufreq write-minimality e2e flow.
Files not reviewed (2)
  • pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go: Language not supported
  • pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go: Language not supported
Comments suppressed due to low confidence (3)

cmd/plugins/balloons/policy/balloons-policy.go:1445

  • When a live update changes only idleCPUClass, this branch detects a CPU-class-only change but never copies newBalloonsOptions.IdleCpuClass into p.bpoptions. The allocator is reconfigured with the old idle class and resetCpuClass() continues to apply the old value, so idle class changes are ignored until a full policy reconfiguration occurs.
			// Update CPUClasses definitions.
			p.bpoptions.CPUClasses = newBalloonsOptions.CPUClasses
			if p.turboAllocator != nil {
				if err := p.turboAllocator.Reconfigure(p.bpoptions.CPUClasses, p.bpoptions.IdleCpuClass); err != nil {

cmd/plugins/balloons/policy/balloons-policy.go:1713

  • The turbo allocator is created/reconfigured before fillBuiltinBalloonDefs() and validateConfig() run. If validation fails, this has already mutated policy/controller state via p.turboAllocator and cpucontrol.SetClass, so an invalid configuration update can leave partially applied CPU class definitions behind despite setConfig() returning an error.
	if p.turboAllocator == nil {
		ta, err := NewCPUClassTurboAllocator(
			WithSystem(p.options.System),
			WithCache(p.cch),
			WithCPUClasses(bpoptions.CPUClasses),
			WithIdleClass(bpoptions.IdleCpuClass),
		)
		if err != nil {
			return balloonsError("failed to create CPU class turbo allocator: %w", err)
		}
		p.turboAllocator = ta
	} else {
		if err := p.turboAllocator.Reconfigure(bpoptions.CPUClasses, bpoptions.IdleCpuClass); err != nil {
			return balloonsError("failed to reconfigure CPU class turbo allocator: %w", err)
		}
	}

cmd/plugins/balloons/policy/cpuclass.go:199

  • Idle CPUs are assigned once but are not tracked or reassigned when the turbo winner changes. If idleCPUClass uses symbolic turbo, idle CPUs keep the effective value from the last reset/release (for example turbo from startup) even after a higher-priority active class should cap non-winners to base.
// ResetIdle assigns the given CPU set to the idle class via the CPU
// controller. Used at policy startup to bring all allowed CPUs to a
// known baseline before any container-driven UseClass call. Does not
// affect the active-class tracking.
func (a *CPUClassTurboAllocator) ResetIdle(cpus cpuset.CPUSet) error {
	if cpus.IsEmpty() {
		return nil
	}
	if err := cpucontrol.Assign(a.cch, a.idleClassName, cpus.UnsortedList()...); err != nil {
		return fmt.Errorf("failed to assign CPUs %s to idle class %q: %w", cpus, a.idleClassName, err)
	}
	return nil

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// kernel sysfs cpufreq interface). Symbolic values ("min", "base",
// "turbo") are stored as sentinel constants and must be resolved with
// Resolve() before being passed to the CPU controller.
// +kubebuilder:validation:Type=string
Comment on lines +170 to +176
// Update the cache even on failure: the desired value
// is unchanged so retrying on every Assign would just
// spam logs without ever succeeding. A subsequent
// configure() resets lastFreq so a real configuration
// change still triggers a fresh attempt.
state.min = min
state.hasMin = true
Comment on lines +1413 to +1418
// Detect changes in CPUClasses definitions (turbo attributes, frequencies, etc.)
if len(opts0.CPUClasses) != len(opts1.CPUClasses) {
return true
}
if utils.DumpJSON(opts0.CPUClasses) != utils.DumpJSON(opts1.CPUClasses) {
return true
a.classByName = make(map[string]*CPUClass, len(classes))
for _, cc := range classes {
a.classByName[cc.Name] = cc
}
}
freq := cpu.FrequencyRange()
baseFreq := cpu.BaseFrequency()
if baseFreq == 0 || freq.Max == 0 {
@askervin askervin requested review from kad and marquiz May 17, 2026 06:48
@askervin
Copy link
Copy Markdown
Collaborator Author

@kad, @marquiz, do you think we could approach turbo budget sharing with this kind of architecture in the balloons policy?

I'm adding cpuClasses under resmgr similarly to schedulingClasses to pave the way taking them into the topology-aware policy's guaranteed containers later on, too.

@askervin askervin marked this pull request as draft May 18, 2026 09:55
@askervin
Copy link
Copy Markdown
Collaborator Author

There is some technical and architectural debt that I wish to pay still in this PR. That is, the CPU controller should not directly modify frequencies, but this should be via cache and aligned with the spirit of applying "pending updates".

Unfortunately controller hooks are container-specific and possibly called multiple times while handling single NRI event, whereas all CPU properties should be written once per NRI event. I'll add yet another hook to the Controller interface to commit whatever changes a controller has stored since the previous Commit().

@askervin askervin force-pushed the 5h1-balloons-cpuclass branch from e919069 to d61da00 Compare May 18, 2026 10:52
@kad kad requested a review from Copilot May 19, 2026 07:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 24 changed files in this pull request and generated 5 comments.

Files not reviewed (2)
  • pkg/apis/config/v1alpha1/resmgr/policy/balloons/zz_generated.deepcopy.go: Language not supported
  • pkg/apis/config/v1alpha1/resmgr/policy/zz_generated.deepcopy.go: Language not supported

Comment on lines 108 to 112
// Just print an error. A config update later on may be valid.
log.Errorf("failed apply /cpuinitial configuration: %v", err)
}

ctl.started = true

return true, nil
Comment on lines +154 to +167
// controller. The recalculation runs first so that the controller's
// in-memory class definition reflects the correct effective turbo
// frequency at the time of Assign.
func (a *CPUClassTurboAllocator) UseClass(className string, cpus cpuset.CPUSet) error {
if cpus.IsEmpty() {
return nil
}
a.removeCpusFromAllClasses(cpus)
if className != "" {
a.activeCpus[className] = a.activeCpus[className].Union(cpus)
}
a.recalculateTurbo()
if err := cpucontrol.Assign(a.cch, className, cpus.UnsortedList()...); err != nil {
return fmt.Errorf("failed to assign CPUs %s to class %q: %w", cpus, className, err)
Comment on lines +1607 to +1610
// Verify that cpuClass references in balloon types are defined
// in either cpuClasses or existing control.cpu.classes.
existingControlClasses := cpucontrol.GetClasses()
for _, blnDef := range bpoptions.BalloonDefs {
Comment on lines +1692 to +1695
// Set bpoptions early so the turbo allocator construction below
// has access to CPUClasses.
p.bpoptions = bpoptions

Comment on lines +868 to +875
- `minFreq` (string or number): Minimum CPU frequency. Accepts values
with units: `"3.2GHz"`, `"2900MHz"`, `"2900000kHz"`, or a plain
number in kHz. Also accepts symbolic names: `"min"` (platform
minimum), `"base"` (CPU base frequency), `"turbo"` (maximum turbo
frequency), which are resolved at runtime from sysfs.
- `maxFreq` (string or number): Maximum CPU frequency (same format).
- `uncoreMinFreq` / `uncoreMaxFreq` (string or number): Uncore
frequency limits (same format).
@askervin askervin force-pushed the 5h1-balloons-cpuclass branch from d61da00 to 15fdc7d Compare May 19, 2026 10:11
askervin added 5 commits May 19, 2026 14:35
…class definitions

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Introduce CPUClassTurboAllocator that owns CPU-class state and all
cpucontrol.Assign / cpucontrol.SetClass calls, keeping CPU-class
concerns out of the rest of the policy.

This change introduces very simple allocator that is unaware of
zones (sockets, dies) or CPU core counts affected by turbo on
different platforms. Smarter allocator is future work.

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
@askervin askervin force-pushed the 5h1-balloons-cpuclass branch from 15fdc7d to 22769db Compare May 19, 2026 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants