Skip to content

[SPARK-55728][SS] Introduce RocksDB conf for file checksum threadpool size#54529

Open
gnanda wants to merge 3 commits intoapache:masterfrom
gnanda:stack/SPARK-55728
Open

[SPARK-55728][SS] Introduce RocksDB conf for file checksum threadpool size#54529
gnanda wants to merge 3 commits intoapache:masterfrom
gnanda:stack/SPARK-55728

Conversation

@gnanda
Copy link

@gnanda gnanda commented Feb 27, 2026

What changes were proposed in this pull request?

This PR introduces a new configuration, spark.sql.streaming.stateStore.fileChecksumThreadPoolSize (internal), that controls the number of threads used by ChecksumCheckpointFileManager when concurrently writing/reading the main checkpoint file
and its .crc sidecar.

Key changes:

  • SQLConf / StateStoreConf / RocksDBConf: Added STATE_STORE_FILE_CHECKSUM_THREAD_POOL_SIZE config (default: 4). Wired it through StateStoreConf.fileChecksumThreadPoolSize and RocksDBConf.fileChecksumThreadPoolSize, replacing the previously
    hardcoded Some(4) in RocksDB.
  • ChecksumCheckpointFileManager: Relaxed the numThreads % 2 == 0 assertion to numThreads >= 0. When numThreads == 0, no thread pool is allocated; all file operations run inline on the calling thread via a caller-runs ExecutionContext.
    Internally, Future scheduling is unified through a scheduleOrRun helper that dispatches to the pool or runs synchronously. close() is a no-op when no pool was created.
  • RocksDB: Reads conf.fileChecksumThreadPoolSize at construction time and emits a warning if the value is below the recommended default of 4 (since 2 concurrent callers × 2 files = 4 threads).

Why are the changes needed?

The thread pool size was previously hardcoded to 4 inside RocksDB. There was no way to tune it for environments where the default is either wasteful (e.g. single-threaded test scenarios) or insufficient (e.g. high-concurrency deployments). In
particular, setting numThreads = 0 provides a clean sequential mode useful for unit tests and for deployments that do not need concurrent file I/O.

Does this PR introduce any user-facing change?

No. The new config is marked internal(). Behavior is unchanged at the default value of 4.

How was this patch tested?

  • ChecksumCheckpointFileManagerSuite: Two new tests — one verifying that numThreads = 0 constructs and closes without error (sequential mode), another verifying that numThreads = -1 throws the expected AssertionError.
  • Existing ChecksumCheckpointFileManager and RocksDB test suites continue to pass with the default config value.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code 2.1.58

@gnanda gnanda force-pushed the stack/SPARK-55728 branch from 8a7db3b to 8167c56 Compare March 2, 2026 05:17
@gnanda gnanda requested a review from micheal-o March 2, 2026 17:54
@gnanda gnanda requested a review from ericm-db March 2, 2026 23:02
Comment on lines +136 to +137
* If file manager is shared by multiple threads, set it to
* number of concurrent callers * 2.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go over in the doc config? Also how would a user figure out how many threads are going to be accessing it?

Comment on lines +304 to +305
threadPool.foreach { pool =>
pool.shutdown()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe silly but would it make sense to do this first foreach and then the subsequent blocking checkin? It's only during shutdown so not a big deal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants