Open
Conversation
When block_submit_bio() fails, set BLOCK_BIT_ERROR so that waiters in wait_event(uptodate_or_error) will wake up rather than waiting indefinitely for a completion. Signed-off-by: Auke Kok <auke.kok@versity.com>
Replace unbounded wait_for_completion() with a 120 second timeout to prevent indefinite hangs during unmount if the server never responds to the farewell request. Signed-off-by: Auke Kok <auke.kok@versity.com>
Add unmounting checks to lock_wait_cond() and lock_key_range() so that lock waiters wake up and new lock requests fail with -ESHUTDOWN during unmount. Replace the unbounded wait_event() with a 60 second timeout to prevent indefinite hangs. Relax the WARN_ON_ONCE at lock_key_range entry to only warn when not unmounting, since late lock attempts during shutdown are expected. Signed-off-by: Auke Kok <auke.kok@versity.com>
The "server error emptying freed" error was causing a fence-and-reclaim test failure. In this case, the error was -ENOLINK, which we should ignore for messaging purposes. Signed-off-by: Chris Kirby <ckirby@versity.com>
Replace unbounded wait_for_completion() in scoutfs_net_sync_request() with a 60 second timeout loop that checks scoutfs_unmounting(). Cancel the queued request before returning -ESHUTDOWN so that sync_response cannot fire on freed stack memory after the caller returns. Signed-off-by: Auke Kok <auke.kok@versity.com>
During normal unmount, lock_invalidate_worker can hang in scoutfs_trans_sync(sb, 1) because the trans commit path may return network errors that cause an infinite retry loop. Skip full lock_invalidate() during shutdown and unmount, and extract lock_clear_coverage() to still clean up coverage items in those paths and in scoutfs_lock_destroy(). Without this, coverage items can remain attached to locks being freed. Signed-off-by: Auke Kok <auke.kok@versity.com>
retry_forever() only checked scoutfs_forcing_unmount(), so a normal unmount with a network error in the commit path would loop forever. Also check scoutfs_unmounting() so the write worker can exit cleanly. Signed-off-by: Auke Kok <auke.kok@versity.com>
Add a WARN_ON_ONCE check that the freed list ref blkno matches the block header blkno after dirtying alloc blocks. Also save and restore freed.first_nr on the error path, and initialize av_old/fr_old to 0 so the diagnostic message has valid values. Signed-off-by: Auke Kok <auke.kok@versity.com>
block_dirty_ref() skipped setting *ref_blkno when the block was already dirty, leaving the caller with a stale value. Set it to 0 on the already-dirty fast path so callers do not try to free a block that was not allocated. Signed-off-by: Auke Kok <auke.kok@versity.com>
Replace the unbounded wait_event() in block_read() with a 120 second timeout that issues a WARN if the bio completion never arrives. A lost completion would otherwise hang silently. Signed-off-by: Auke Kok <auke.kok@versity.com>
9f337b7 to
a46e701
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Various "trial-and-error" fixes accumulated by trying to diagnose and handle our unmount failures in CI.
Not for review yet.