KAFKA-10025: guard RocksDBMetricsRecorder value provider reads against store close by Aangbaeck · Pull Request #22717 · apache/kafka

Aangbaeck · 2026-07-01T09:19:50Z

What

RocksDBMetricsRecorder reads native RocksDB value providers (RocksDB and Statistics)
in three places:

record() — statistics.getAndResetTickerCount(...) / getHistogramData(...)
the property gauges (gaugeToComputeSumOfProperties) — db.getAggregatedLongProperty(...)
the block-cache gauges (gaugeToComputeBlockCacheMetrics) — db.getLongProperty(...)

These run with no mutual exclusion against removeValueProviders(...). RocksDBStore.close()
calls removeValueProviders(name) and then closes (frees) the native RocksDB and
Statistics. Because the reads and the removal are not mutually exclusive, a metrics read
that is in flight when a store is closed (during a rebalance / task migration) can
dereference a native handle that close() is concurrently freeing — a native
use-after-free / SIGSEGV.

Two observed crash frames, same root cause:

record() path → Statistics::getAndResetTickerCount — this is KAFKA-10025 (open since 2020).
gauge path → rocksdb::DBImpl::GetAggregatedIntProperty — observed in production under a
from-zero state rebuild, where warmup/probing rebalances close stores continuously while a
metrics reporter / JMX scrape evaluates the (INFO-level) RocksDB property gauges.

Note the gauge metrics are registered at RecordingLevel.INFO, so they are active and
scraped even when metrics.recording.level=INFO; only the record() (statistics) path is
gated to DEBUG. So the crash is reachable at INFO.

Why the current code is unsafe

storeToValueProviders is a ConcurrentHashMap, which makes the map operations
thread-safe, but does not prevent a reader that has already obtained a
DbAndCacheAndStatistics from calling a native method on its db/statistics after (or
while) RocksDBStore.close() frees them. There is no happens-before between "recorder reads
the provider" and "store closes the provider".

Fix

Introduce a single lock (valueProvidersLock) in RocksDBMetricsRecorder and hold it
around every read of the value providers (record(), both gauge lambdas) and around the
map mutations (addValueProviders, removeValueProviders). Since RocksDBStore.close()
already calls removeValueProviders(...) before it frees the native db/statistics,
removeValueProviders(...) acquiring the lock guarantees:

any in-flight read completes before the segment is removed and the native handles freed, and
any read that starts after removal no longer sees the segment,

so no read can ever dereference a freed handle.

No lock-ordering risk: RocksDBMetricsRecordingTrigger holds no lock while calling
record(), and the guarded reads never call back into RocksDBStore. RocksDBStore.close()
takes the store monitor then this lock; opens take the same order — consistent, no cycle.

Testing

New test RocksDBMetricsRecorderGaugesTest#shouldNotRemoveValueProvidersWhileGaugeIsReadingThem:
blocks a gauge evaluation inside getAggregatedLongProperty (holding the lock) and asserts
removeValueProviders(...) cannot return until the read completes — i.e. the use-after-free
window is closed. Verified it fails without the fix (AssertionError: …the use-after-free window is open) and passes with it.
Existing RocksDBMetricsRecorderTest / RocksDBMetricsRecorderGaugesTest, checkstyle and
spotbugs all pass.

End-to-end confirmation (outside this PR, in a standalone Docker harness): a real Kafka
Streams app on the released kafka-streams 8.2.1-ce, at metrics.recording.level=INFO, with
a JMX-style metrics scrape (reading the INFO-level RocksDB property gauges) plus forced
rebalances, SIGSEGVs in rocksdb::DBImpl::GetAggregatedIntProperty+0x83 on a scrape thread
after ~27M gauge reads. Running the same binary with only RocksDBMetricsRecorder replaced
by the patched class (classpath shadow, load-verified) survived 600s / ~336M gauge reads / 25
rebalances with zero crashes. The exact native crash was also reproduced in a pure rocksdbjni
harness by racing getAggregatedLongProperty against a concurrent DB close+reopen.

…t store close RocksDBMetricsRecorder reads native RocksDB value providers (RocksDB and Statistics) in record() (getAndResetTickerCount / getHistogramData), in the property gauges (RocksDB.getAggregatedLongProperty) and in the block-cache gauges (RocksDB.getLongProperty). These reads had no mutual exclusion against removeValueProviders(). RocksDBStore.close() calls removeValueProviders() and then closes (frees) the native RocksDB and Statistics. Because the reads and the removal were not mutually exclusive, a metrics read that is in flight when a store is closed (e.g. during a rebalance / task migration) can dereference a native handle that close() is concurrently freeing, causing a native use-after-free / SIGSEGV. storeToValueProviders being a ConcurrentHashMap only makes the map operations safe; it does not prevent a reader that already holds a DbAndCacheAndStatistics from calling into a db/statistics that close() frees. Two observed crash frames, same root cause: - record() path -> Statistics::getAndResetTickerCount (this ticket) - gauge path -> rocksdb::DBImpl::GetAggregatedIntProperty (property gauges, registered at RecordingLevel.INFO, so reachable via a metrics reporter/JMX scrape even at metrics.recording.level=INFO) Fix: hold a single lock around every read of the value providers (record() and both gauge lambdas) and around the map mutations (addValueProviders / removeValueProviders). Since RocksDBStore.close() calls removeValueProviders() before freeing the native handles, acquiring the lock there waits for any in-flight read to finish and prevents any later read from seeing the segment, so no read can dereference a freed handle. The recording trigger holds no lock while calling record(), and the guarded reads never call back into RocksDBStore, so no lock-ordering cycle is introduced. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions Bot added triage PRs from the community streams labels Jul 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KAFKA-10025: guard RocksDBMetricsRecorder value provider reads against store close#22717

KAFKA-10025: guard RocksDBMetricsRecorder value provider reads against store close#22717
Aangbaeck wants to merge 1 commit into
apache:trunkfrom
Aangbaeck:KAFKA-10025-rocksdb-metrics-recorder-uaf

Aangbaeck commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Aangbaeck commented Jul 1, 2026

What

Why the current code is unsafe

Fix

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant