CA: Persist and expose NodeInfos computed by TemplateNodeInfoProvider

**Which component are you using?**:

/area cluster-autoscaler
/area core-autoscaler

**Is your feature request designed to solve a problem? If so describe the problem this feature should solve.**:

The [TemplateNodeInfoProvider](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/nodeinfosprovider/node_info_provider_processor.go) interface is responsible for computing template NodeInfos for every autoscaled NodeGroup for the purpose of scale-up simulations. The main implementation for the interface is [MixedTemplateNodeInfoProvider](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go), which tries to create the templates based on sanitized real Nodes, and falls back to `NodeGroup.TemplateNodeInfo()` if there aren't any healthy Nodes to sanitize in a given NodeGroup.

The processor is called near the beginning of `StaticAutoscaler.RunOnce()` [here](https://github.com/kubernetes/autoscaler/blob/fb2899a594ed99489add5c46e3680a460776154b/cluster-autoscaler/core/static_autoscaler.go#L358). The map of NodeInfo templates it produces is just a local variable that gets passed to various pieces of `RunOnce()` logic, most notably scale-up.

This has the following problems:
* Most CA processors don't have access to the NodeInfo map. Some of the processor implementations need NodeInfo templates for their logic. One example is the [DRA readiness processor](https://github.com/kubernetes/autoscaler/blob/fb2899a594ed99489add5c46e3680a460776154b/cluster-autoscaler/processors/customresources/dra_processor.go#L60), which needs the template to know what DRA Devices the readiness logic should wait for. Right now, this and similar processors have to resort to using `NodeGroup.TemplateNodeInfo()` as the template. Using `NodeGroup.TemplateNodeInfo()` is much less reliable than sanitizing a real Node - each part of the template has to be crafted from scratch based on hardcoded logic in CA. If there's a part of the Node (e.g. a new label, or a DRA Device) that isn't correctly predicted by `NodeGroup.TemplateNodeInfo()`, such processor implementations stop working correctly - even if there's at least 1 Node in the NodeGroup.
* Some of the processors that need the template map (e.g. the DRA processor mentioned above) are actually executed before `TemplateNodeInfoProvider` computes the map within a single CA loop. This is intentional and necessary - `MixedTemplateNodeInfoProvider` uses Node readiness as part of the logic to determine if a Node is a good candidate for being sanitized into a template, so the DRA processor needs to be executed earlier to hack the readiness correctly.

**Describe the solution you'd like.**:

We should introduce a new component responsible for storing, updating, and exposing template NodeInfos computed by `TemplateNodeInfoProvider`. Such component should:

* Embed `TemplateNodeInfoProcessor`, use it for computing template NodeInfos, and cache the results internally until the next recomputation.
* Recompute the cached templates every CA loop, in the same place where they are computed now.
* Expose both the full map of computed templates, and a way to get a template for a single NodeGroup.
* Be accessible from CA processors. This is probably best achieved by placing it in [`AutoscalingContext`](https://github.com/kubernetes/autoscaler/blob/fb2899a594ed99489add5c46e3680a460776154b/cluster-autoscaler/context/autoscaling_context.go#L41).
* Be usable in any part of CA logic - including in the parts of main CA loop before templates are recomputed, and in fully separate goroutines. If the templates are accessed before the recomputation, the component should return the previously computed ones. The component should be thread-safe.

**Additional context.**:

#8881 can be trivially solved after this is completed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CA: Persist and expose NodeInfos computed by TemplateNodeInfoProvider #8882

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CA: Persist and expose NodeInfos computed by TemplateNodeInfoProvider #8882

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions