fuse: fix multiple issues during conn abort#138
fuse: fix multiple issues during conn abort#138hbirth wants to merge 2 commits intoDDNStorage:redfs-ubuntu-noble-6.8.0-58.60from
Conversation
|
we have to verify this with John Gu ... since I cannot reproduce the state on my system |
b665e47 to
8fb4203
Compare
|
just added debug print for all requests that are in the system ... printed after 10 seconds when we hang in fuse_wait_aborted() |
2ebd093 to
00bac22
Compare
cf3e5eb to
4a7ae2f
Compare
Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
4a7ae2f to
16c0fc2
Compare
| */ | ||
| for (qid = 0; qid < ring->max_nr_queues; qid++) { | ||
| queue = READ_ONCE(ring->queues[qid]); | ||
| if (!queue) |
There was a problem hiding this comment.
I still don't see why two runs are required. With one single lock, let's say we have two competing tasks, task-1 tries to add the request, task-2 tries to abort the queue.
Scenario1:
task-1 wins and is in the process to add to the queue -> takes the lock, task-2 (abort) is blocked. Once task-2 continues, It blocks the queue and flushes requests. After that another task cannot add new requests, because queue->stopped` is set.
Scenario2:
task-2 (abort) wins, takes the lock, set queue->lock and flushes the queue. Once it releases the lock, task-1 wakes up, sees that the queue is stopped and fails the request with -ENOTCONN.
There was a problem hiding this comment.
is it really that bad? this only gets called when the connection is destroyed.
| * queued even when no ring entries are active. | ||
| */ | ||
| fuse_uring_flush_bg(fc); | ||
| fuse_uring_stop_queues(ring); |
There was a problem hiding this comment.
The comment is not right, queue_refs == 0 means there are no ring_ent on any queue - an impossible condition. And with queue_refs == 0, ring->ready will not be true.
There was a problem hiding this comment.
why is that an impossible condition? I have seen it many times in debug prints
There was a problem hiding this comment.
At startup time - > ring gets created, first queue gets created, then a ring-entry gets created (increases queue_refs), then ring->ready is set.
teardown:
fuse_uring_teardown_entries
fuse_uring_stop_list_entries() decreases queue_refs
Btw, if there would be a problem with FRRS_FUSE_REQ and FRRS_COMMIT, queue refs would never go to 0 - exactly here.
Anyway, fuse_uring_teardown_entries() is called by fuse_uring_stop_queues, which is supposed to be called in fuse_uring_abort() after fuse_uring_abort_end_requests(). Which means it is after queue->stopped = true
There was a problem hiding this comment.
yes ... that's why I had done those above (maybe in a bit of dispair) ... we have queue_refs == 0, and queue ready, and still requests active that will never terminate.
There was a problem hiding this comment.
BTW, all I want here is to call fuse_flush_bg() and fuse_uring_stop_queues() no matter what. What is wrong with that?
78f26c1 to
177aa69
Compare
Fix uninterruptible sleep (D state) hangs during FUSE filesystem teardown when using io_uring. The issue manifests as processes stuck waiting for requests that are never completed, particularly affecting force requests like FUSE_FLUSH or when requests are created after fuse_abort_conn() already finished. Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
177aa69 to
e3bbfe2
Compare
No description provided.