Skip to content

[SPARK-57543] Make ConfigOption dynamic override opt-in and filter dynamic config refresh#713

Open
jiangzho wants to merge 1 commit into
apache:mainfrom
jiangzho:conf_wl
Open

[SPARK-57543] Make ConfigOption dynamic override opt-in and filter dynamic config refresh#713
jiangzho wants to merge 1 commit into
apache:mainfrom
jiangzho:conf_wl

Conversation

@jiangzho

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR makes runtime dynamic config override opt-in and enforces it centrally when dynamic config is refreshed:

  1. ConfigOption
    • enableDynamicOverride now defaults to false — an option must explicitly opt in via .enableDynamicOverride(true) to be overridable at runtime.
    • Each option registers itself (keyed by its config key) into a static registry on construction, exposing dynamicOverrideEnabledKeys() as the allow-list of overridable keys. enableDynamicOverride on each option remains the single source of truth.
  2. SparkOperatorConfManager.refresh(Map) now filters the incoming ConfigMap data against ConfigOption.dynamicOverrideEnabledKeys(), applying only overridable keys and logging any dropped (unknown or non-overridable) keys at WARN. getValue(String) and getAll() therefore no longer surface override values for keys that are never honored.
  3. SparkOperatorConf options now set enableDynamicOverrideexplicitly; this also fixes docs for LOG_CONF and PERIODIC_GC_INTERVAL_SECONDS, which are only read at startup and are non-overridable.

The existing per-option check in ConfigOption.resolveValue() is kept as defense-in-depth.

Why are the changes needed?

Previously refresh() stored the entire ConfigMap into the override map, and enableDynamicOverride defaulted to true. As a result, an override value could be stored (and surfaced via getValue/getAll/the LOG_CONF dump) for keys that are not meant to be hot-reloaded, and any new option was overridable by default. Defaulting to opt-in and dropping non-allow-listed keys at refresh time makes the override surface explicit, safer, and self-documenting, and avoids silently ignored config updates.

Does this PR introduce any user-facing change?

Yes. Dynamic config updates for keys that are not explicitly marked enableDynamicOverride(true) are now ignored (and logged at WARN) instead of being stored. In particular spark.logConf and spark.kubernetes.operator.periodicGC.intervalSeconds are no longer dynamically overridable (both are only read at startup). The generated config documentation's "Allow Hot Reloading" column reflects each option's flag.

How was this patch tested?

Updated ConfigOptionTest and SparkOperatorConfManagerTest, and added SparkOperatorConfManagerTest#testRefreshDropsUnknownAndNonOverridableKeys covering the allow-list filtering. Existing SparkOperatorConfigMapReconcilerTest continues to exercise the dynamic-config refresh path with an overridable key.

Was this patch authored or co-authored using generative AI tooling?

Co-Authored-By: Claude Opus 4.8

…namic config refresh

### What changes were proposed in this pull request?

This PR makes runtime dynamic config override opt-in and enforces
it centrally when dynamic config is refreshed:

1. `ConfigOption`
    - `enableDynamicOverride` now defaults to `false` — an option must explicitly opt in via `.enableDynamicOverride(true)` to be overridable at runtime.
    - Each option registers itself (keyed by its config key) into a static registry on construction, exposing `dynamicOverrideEnabledKeys()` as the allow-list of overridable keys. `enableDynamicOverride` on each option remains the single source of truth.
2. `SparkOperatorConfManager.refresh(Map)` now filters the incoming ConfigMap data against `ConfigOption.dynamicOverrideEnabledKeys()`, applying only overridable keys and logging any dropped (unknown or non-overridable) keys at WARN. `getValue(String)` and `getAll()` therefore no longer surface override values for keys that are never honored.
3. `SparkOperatorConf` options now set `enableDynamicOverride`explicitly; this also fixes docs for `LOG_CONF` and `PERIODIC_GC_INTERVAL_SECONDS`, which are only read at startup and are non-overridable.

The existing per-option check in `ConfigOption.resolveValue()` is kept as defense-in-depth.

### Why are the changes needed?

Previously `refresh()` stored the entire ConfigMap into the override map, and `enableDynamicOverride` defaulted to `true`. As a result, an override value could be stored (and surfaced via `getValue`/`getAll`/the `LOG_CONF` dump) for keys that are not meant to be hot-reloaded, and any new option was overridable by default. Defaulting to opt-in and dropping non-allow-listed keys at refresh time makes the override surface explicit, safer, and self-documenting, and avoids silently ignored config updates.

### Does this PR introduce _any_ user-facing change?

Yes. Dynamic config updates for keys that are not explicitly marked `enableDynamicOverride(true)` are now ignored (and logged at WARN) instead of being stored. In particular `spark.logConf` and `spark.kubernetes.operator.periodicGC.intervalSeconds` are no longer dynamically overridable (both are only read at startup). The generated config documentation's "Allow Hot Reloading" column reflects each option's flag.

### How was this patch tested?

Updated `ConfigOptionTest` and `SparkOperatorConfManagerTest`, and added `SparkOperatorConfManagerTest#testRefreshDropsUnknownAndNonOverridableKeys` covering the allow-list filtering. Existing `SparkOperatorConfigMapReconcilerTest` continues to exercise the dynamic-config refresh path with an overridable key.

### Was this patch authored or co-authored using generative AI tooling?

Co-Authored-By: Claude Opus 4.8
*
* @return An immutable set of dynamically overridable config keys.
*/
public static Set<String> dynamicOverrideEnabledKeys() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allow-list reflects only ConfigOptions that have already been constructed. Because the real options are static final fields on SparkOperatorConf, they register themselves only when that class is first initialized — so refresh()'s correctness rests on an implicit invariant: SparkOperatorConf must be class-loaded before any refresh() runs.

Today that holds, but only incidentally: SparkOperator reads SparkOperatorConf.LOG_CONF and SparkOperatorConf.DYNAMIC_CONFIG_ENABLED during boot (SparkOperator.java:104 and :109), which forces class-init before registerSparkOperatorConfMonitor() wires up the ConfigMap reconciler that calls refresh(). If that ordering ever changes — a new entrypoint, a reordering of the boot sequence, or a test that drives refresh() without touching SparkOperatorConfdynamicOverrideEnabledKeys() would return an empty (or partial) set and every incoming override would be silently dropped, with only a WARN to show for it.

Suggest making the invariant explicit rather than relying on boot order. Minimally, a Javadoc note here stating that the returned set only includes options constructed so far. Better, have the conf manager force initialization before its first refresh, e.g. reference SparkOperatorConf (a class touch is enough to trigger static init) when the manager starts, so the allow-list is guaranteed complete independent of who calls refresh() first.

@peter-toth peter-toth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, my inline comment is non-blocking nit, but pending CI.

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @jiangzho . I'd like to recommend to rebase this PR once more in order to trigger the CI properly.

@dongjoon-hyun

Copy link
Copy Markdown
Member

Gentle ping, @jiangzho .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants