CI run by aversecat · Pull Request #302 · versity/scoutfs

aversecat · 2026-04-15T21:42:38Z

Various "trial-and-error" fixes accumulated by trying to diagnose and handle our unmount failures in CI.

Not for review yet.

When block_submit_bio() fails, set BLOCK_BIT_ERROR so that waiters in wait_event(uptodate_or_error) will wake up rather than waiting indefinitely for a completion. Signed-off-by: Auke Kok <auke.kok@versity.com>

Replace unbounded wait_for_completion() with a 120 second timeout to prevent indefinite hangs during unmount if the server never responds to the farewell request. Signed-off-by: Auke Kok <auke.kok@versity.com>

Add unmounting checks to lock_wait_cond() and lock_key_range() so that lock waiters wake up and new lock requests fail with -ESHUTDOWN during unmount. Replace the unbounded wait_event() with a 60 second timeout to prevent indefinite hangs. Relax the WARN_ON_ONCE at lock_key_range entry to only warn when not unmounting, since late lock attempts during shutdown are expected. Signed-off-by: Auke Kok <auke.kok@versity.com>

The "server error emptying freed" error was causing a fence-and-reclaim test failure. In this case, the error was -ENOLINK, which we should ignore for messaging purposes. Signed-off-by: Chris Kirby <ckirby@versity.com>

Replace unbounded wait_for_completion() in scoutfs_net_sync_request() with a 60 second timeout loop that checks scoutfs_unmounting(). Cancel the queued request before returning -ESHUTDOWN so that sync_response cannot fire on freed stack memory after the caller returns. Signed-off-by: Auke Kok <auke.kok@versity.com>

During normal unmount, lock_invalidate_worker can hang in scoutfs_trans_sync(sb, 1) because the trans commit path may return network errors that cause an infinite retry loop. Skip full lock_invalidate() during shutdown and unmount, and extract lock_clear_coverage() to still clean up coverage items in those paths and in scoutfs_lock_destroy(). Without this, coverage items can remain attached to locks being freed. Signed-off-by: Auke Kok <auke.kok@versity.com>

retry_forever() only checked scoutfs_forcing_unmount(), so a normal unmount with a network error in the commit path would loop forever. Also check scoutfs_unmounting() so the write worker can exit cleanly. Signed-off-by: Auke Kok <auke.kok@versity.com>

Add a WARN_ON_ONCE check that the freed list ref blkno matches the block header blkno after dirtying alloc blocks. Also save and restore freed.first_nr on the error path, and initialize av_old/fr_old to 0 so the diagnostic message has valid values. Signed-off-by: Auke Kok <auke.kok@versity.com>

block_dirty_ref() skipped setting *ref_blkno when the block was already dirty, leaving the caller with a stale value. Set it to 0 on the already-dirty fast path so callers do not try to free a block that was not allocated. Signed-off-by: Auke Kok <auke.kok@versity.com>

Replace the unbounded wait_event() in block_read() with a 120 second timeout that issues a WARN if the bio completion never arrives. A lost completion would otherwise hang silently. Signed-off-by: Auke Kok <auke.kok@versity.com>

aversecat added the WIP label Apr 15, 2026

aversecat and others added 10 commits April 16, 2026 13:03

Set BLOCK_BIT_ERROR on bio submit failure.

f36b040

When block_submit_bio() fails, set BLOCK_BIT_ERROR so that waiters in wait_event(uptodate_or_error) will wake up rather than waiting indefinitely for a completion. Signed-off-by: Auke Kok <auke.kok@versity.com>

Add client timeout to farewell completion wait.

40f2446

Replace unbounded wait_for_completion() with a 120 second timeout to prevent indefinite hangs during unmount if the server never responds to the farewell request. Signed-off-by: Auke Kok <auke.kok@versity.com>

Suppress another forced shutdown error message

b832c73

The "server error emptying freed" error was causing a fence-and-reclaim test failure. In this case, the error was -ENOLINK, which we should ignore for messaging purposes. Signed-off-by: Chris Kirby <ckirby@versity.com>

Warn on block read bio completion timeout

a46e701

Replace the unbounded wait_event() in block_read() with a 120 second timeout that issues a WARN if the bio completion never arrives. A lost completion would otherwise hang silently. Signed-off-by: Auke Kok <auke.kok@versity.com>

aversecat force-pushed the auke/make_ci_green_again branch from 9f337b7 to a46e701 Compare April 16, 2026 20:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI run#302

CI run#302
aversecat wants to merge 10 commits intomainfrom
auke/make_ci_green_again

aversecat commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aversecat commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants