[fix][meta] Fix race condition in ZooKeeper session reconnection leading to SESSIONEXPIRED Errors#25237
Conversation
| }; | ||
|
|
||
| private ZooKeeper getZkHandle() { | ||
| lock.readLock().lock(); |
There was a problem hiding this comment.
I don't see how this read lock is helping here. By the time we get the object and we release the lock it can be closed and reset. I would be the same as the zk.get()
There was a problem hiding this comment.
The primary fix that actually works is in modification 1, which sets currentStatus before calling sessionListener.accept. Adding a read-write lock merely reduces the probability of operations using the old ZooKeeper client. Once a new client needs to be created, incoming operations must wait until the new client creation is complete.
0d2ff53 to
a51ec81
Compare
|
Could we add a regression test that reproduces the original race condition? The fix depends on preserving a SessionLost event from the SessionReestablished callback path, but the existing metadata tests only cover the successful re-registration path after a reconnect. Without a test that fails addWatch during SessionReestablished and verifies the event is emitted again, this could regress easily in a future refactor. |
Motivation
When using PulsarZooKeeperClient with sessionWatcher enabled, a race condition during ZooKeeper session reconnection can cause operations to fail with SESSIONEXPIRED errors even after the session has been successfully re-established. This may also result in the loss of notifications for the SessionReestablished event.
In ZKSessionWatcher, currentStatus = SessionEvent.SessionLost. Now, we reconnect to zk,and trigger ZKSessionWatcher::checkState(SyncConnected).
After this, because the state has already been updated to SessionReestablished, the next time the session state is checked, sessionListener.accept(SessionEvent.SessionReestablished); will no longer be triggered.
Modifications
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-complete