Skip to content

[SPARK-55798][SQL] Propagate config data type into config builders#54579

Closed
dtenedor wants to merge 1 commit intoapache:masterfrom
dtenedor:add-explicit-config-types
Closed

[SPARK-55798][SQL] Propagate config data type into config builders#54579
dtenedor wants to merge 1 commit intoapache:masterfrom
dtenedor:add-explicit-config-types

Conversation

@dtenedor
Copy link
Contributor

@dtenedor dtenedor commented Mar 2, 2026

What changes were proposed in this pull request?

Add a ConfigEntryType sealed-trait enum to ConfigEntry[T], threaded from ConfigBuilder through TypedConfigBuilder and all create* methods, so that config entries are tagged with their declared type at construction time without runtime type probing or exception handling.

Specifically:

  • New ConfigEntryType sealed trait (ConfigEntry.scala) with case objects BooleanEntry, IntEntry, LongEntry, DoubleEntry, StringEntry, EnumEntry, TimeEntry, BytesEntry, RegexEntry, and OtherEntry.
  • ConfigEntry[T] gains a required val configEntryType: ConfigEntryType constructor parameter, propagated through all five subclasses (ConfigEntryWithDefault, ConfigEntryWithDefaultFunction, ConfigEntryWithDefaultString, OptionalConfigEntry, FallbackConfigEntry).
  • TypedConfigBuilder[T] gains a required val configEntryType: ConfigEntryType constructor parameter, propagated through transform, toSequence, and all create* methods (createWithDefault, createWithDefaultFunction, createWithDefaultString, createOptional).
  • Every ConfigBuilder.*Conf factory method (intConf, longConf, doubleConf, booleanConf, stringConf, enumConf, timeConf, bytesConf, regexConf) passes the appropriate enum variant. fallbackConf inherits the variant from the fallback entry.
  • configEntryType is a required (non-default) constructor parameter on both ConfigEntry and TypedConfigBuilder, so the compiler forces every new construction site to explicitly specify the type—preventing silent omission.

Using an enum instead of a single isBooleanEntry: Boolean flag makes the design extensible: callers can match on the specific config type (e.g. to optimize access paths differently for boolean vs. numeric entries) without adding new boolean fields for each type.

Why are the changes needed?

Pattern matching on config values at runtime (e.g. case b: Boolean => ...) or using isInstanceOf[Boolean] type tests causes JVM class_check deoptimizations at megamorphic call sites. By tagging each config entry with its declared type at construction time, hot-path config access code can use a simple field check instead, avoiding these deoptimizations entirely.

Does this PR introduce any user-facing change?

No. ConfigEntryType and configEntryType are private[spark]; no public API is affected.

How was this patch tested?

New unit test suite RecordConfigAccessSuite (core/src/test/scala/org/apache/spark/RecordConfigAccessSuite.scala) with 19 tests covering:

  • Correct configEntryType assignment for builtin entries of every type (boolean, int, long, double, string, bytes, time).
  • fallbackConf inheritance of configEntryType from the fallback entry.
  • Preservation of configEntryType through all create* variants (createWithDefault, createWithDefaultString, createWithDefaultFunction, createOptional).
  • Preservation through transform, checkValue, and toSequence.
  • One test per ConfigBuilder.*Conf method confirming the correct enum variant.
  • Negative test verifying non-boolean entries do not carry BooleanEntry.

Run with:

build/sbt "core/testOnly org.apache.spark.RecordConfigAccessSuite"

Was this patch authored or co-authored using generative AI tooling?

Yes, claude-4.6-opus-high

@dtenedor dtenedor changed the title commit [SPARK-55798][SQL] Propagate config data type into config builders Mar 2, 2026
@dtenedor dtenedor closed this Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant