[SPARK-57543] Make ConfigOption dynamic override opt-in and filter dynamic config refresh#713
[SPARK-57543] Make ConfigOption dynamic override opt-in and filter dynamic config refresh#713jiangzho wants to merge 1 commit into
Conversation
…namic config refresh
### What changes were proposed in this pull request?
This PR makes runtime dynamic config override opt-in and enforces
it centrally when dynamic config is refreshed:
1. `ConfigOption`
- `enableDynamicOverride` now defaults to `false` — an option must explicitly opt in via `.enableDynamicOverride(true)` to be overridable at runtime.
- Each option registers itself (keyed by its config key) into a static registry on construction, exposing `dynamicOverrideEnabledKeys()` as the allow-list of overridable keys. `enableDynamicOverride` on each option remains the single source of truth.
2. `SparkOperatorConfManager.refresh(Map)` now filters the incoming ConfigMap data against `ConfigOption.dynamicOverrideEnabledKeys()`, applying only overridable keys and logging any dropped (unknown or non-overridable) keys at WARN. `getValue(String)` and `getAll()` therefore no longer surface override values for keys that are never honored.
3. `SparkOperatorConf` options now set `enableDynamicOverride`explicitly; this also fixes docs for `LOG_CONF` and `PERIODIC_GC_INTERVAL_SECONDS`, which are only read at startup and are non-overridable.
The existing per-option check in `ConfigOption.resolveValue()` is kept as defense-in-depth.
### Why are the changes needed?
Previously `refresh()` stored the entire ConfigMap into the override map, and `enableDynamicOverride` defaulted to `true`. As a result, an override value could be stored (and surfaced via `getValue`/`getAll`/the `LOG_CONF` dump) for keys that are not meant to be hot-reloaded, and any new option was overridable by default. Defaulting to opt-in and dropping non-allow-listed keys at refresh time makes the override surface explicit, safer, and self-documenting, and avoids silently ignored config updates.
### Does this PR introduce _any_ user-facing change?
Yes. Dynamic config updates for keys that are not explicitly marked `enableDynamicOverride(true)` are now ignored (and logged at WARN) instead of being stored. In particular `spark.logConf` and `spark.kubernetes.operator.periodicGC.intervalSeconds` are no longer dynamically overridable (both are only read at startup). The generated config documentation's "Allow Hot Reloading" column reflects each option's flag.
### How was this patch tested?
Updated `ConfigOptionTest` and `SparkOperatorConfManagerTest`, and added `SparkOperatorConfManagerTest#testRefreshDropsUnknownAndNonOverridableKeys` covering the allow-list filtering. Existing `SparkOperatorConfigMapReconcilerTest` continues to exercise the dynamic-config refresh path with an overridable key.
### Was this patch authored or co-authored using generative AI tooling?
Co-Authored-By: Claude Opus 4.8
| * | ||
| * @return An immutable set of dynamically overridable config keys. | ||
| */ | ||
| public static Set<String> dynamicOverrideEnabledKeys() { |
There was a problem hiding this comment.
This allow-list reflects only ConfigOptions that have already been constructed. Because the real options are static final fields on SparkOperatorConf, they register themselves only when that class is first initialized — so refresh()'s correctness rests on an implicit invariant: SparkOperatorConf must be class-loaded before any refresh() runs.
Today that holds, but only incidentally: SparkOperator reads SparkOperatorConf.LOG_CONF and SparkOperatorConf.DYNAMIC_CONFIG_ENABLED during boot (SparkOperator.java:104 and :109), which forces class-init before registerSparkOperatorConfMonitor() wires up the ConfigMap reconciler that calls refresh(). If that ordering ever changes — a new entrypoint, a reordering of the boot sequence, or a test that drives refresh() without touching SparkOperatorConf — dynamicOverrideEnabledKeys() would return an empty (or partial) set and every incoming override would be silently dropped, with only a WARN to show for it.
Suggest making the invariant explicit rather than relying on boot order. Minimally, a Javadoc note here stating that the returned set only includes options constructed so far. Better, have the conf manager force initialization before its first refresh, e.g. reference SparkOperatorConf (a class touch is enough to trigger static init) when the manager starts, so the allow-list is guaranteed complete independent of who calls refresh() first.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Thank you, @jiangzho . I'd like to recommend to rebase this PR once more in order to trigger the CI properly.
|
Gentle ping, @jiangzho . |
What changes were proposed in this pull request?
This PR makes runtime dynamic config override opt-in and enforces it centrally when dynamic config is refreshed:
ConfigOptionenableDynamicOverridenow defaults tofalse— an option must explicitly opt in via.enableDynamicOverride(true)to be overridable at runtime.dynamicOverrideEnabledKeys()as the allow-list of overridable keys.enableDynamicOverrideon each option remains the single source of truth.SparkOperatorConfManager.refresh(Map)now filters the incoming ConfigMap data againstConfigOption.dynamicOverrideEnabledKeys(), applying only overridable keys and logging any dropped (unknown or non-overridable) keys at WARN.getValue(String)andgetAll()therefore no longer surface override values for keys that are never honored.SparkOperatorConfoptions now setenableDynamicOverrideexplicitly; this also fixes docs forLOG_CONFandPERIODIC_GC_INTERVAL_SECONDS, which are only read at startup and are non-overridable.The existing per-option check in
ConfigOption.resolveValue()is kept as defense-in-depth.Why are the changes needed?
Previously
refresh()stored the entire ConfigMap into the override map, andenableDynamicOverridedefaulted totrue. As a result, an override value could be stored (and surfaced viagetValue/getAll/theLOG_CONFdump) for keys that are not meant to be hot-reloaded, and any new option was overridable by default. Defaulting to opt-in and dropping non-allow-listed keys at refresh time makes the override surface explicit, safer, and self-documenting, and avoids silently ignored config updates.Does this PR introduce any user-facing change?
Yes. Dynamic config updates for keys that are not explicitly marked
enableDynamicOverride(true)are now ignored (and logged at WARN) instead of being stored. In particularspark.logConfandspark.kubernetes.operator.periodicGC.intervalSecondsare no longer dynamically overridable (both are only read at startup). The generated config documentation's "Allow Hot Reloading" column reflects each option's flag.How was this patch tested?
Updated
ConfigOptionTestandSparkOperatorConfManagerTest, and addedSparkOperatorConfManagerTest#testRefreshDropsUnknownAndNonOverridableKeyscovering the allow-list filtering. ExistingSparkOperatorConfigMapReconcilerTestcontinues to exercise the dynamic-config refresh path with an overridable key.Was this patch authored or co-authored using generative AI tooling?
Co-Authored-By: Claude Opus 4.8