Skip to content

feat: add graceful loop device cleanup on Ctrl+C interrupt#583

Open
rpandya28 wants to merge 9 commits into
mainfrom
fix/graceful-loop-cleanup-on-interrupt
Open

feat: add graceful loop device cleanup on Ctrl+C interrupt#583
rpandya28 wants to merge 9 commits into
mainfrom
fix/graceful-loop-cleanup-on-interrupt

Conversation

@rpandya28
Copy link
Copy Markdown
Contributor

Merge Checklist

All boxes should be checked before merging the PR

  • The changes in the PR have been built and tested
  • Documentation has been updated to reflect the changes (or no doc update needed)
  • Ready to merge

Description

Any Newly Introduced Dependencies

How Has This Been Tested?

Copilot AI review requested due to automatic review settings May 27, 2026 05:23
@rpandya28 rpandya28 requested a review from a team as a code owner May 27, 2026 05:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make raw image builds more resilient by cleaning up loop devices when the process is interrupted (Ctrl+C / SIGTERM), and by making loop-device teardown more robust (swapoff, unmounts, and detach fallbacks).

Changes:

  • Add SIGINT/SIGTERM handling in RawMaker.BuildRawImage() to detach the loop device on interrupt.
  • Enhance LoopDev.LoopSetupDelete() to disable swap, unmount mounted partitions, and handle “already detached” cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
internal/image/rawmaker/rawmaker.go Adds signal handling + goroutine-triggered cleanup for loop devices during raw image builds.
internal/image/imagedisc/loopdev.go Expands loop device cleanup to include swapoff + unmounts, and adds detach fallbacks.

Comment thread internal/image/rawmaker/rawmaker.go Outdated
Comment on lines 142 to 146
defer func() {
signal.Stop(sigChan)
close(sigChan)
if loopDevPath != "" {
if detachErr := rawMaker.LoopDev.LoopSetupDelete(loopDevPath); detachErr != nil {
Comment thread internal/image/rawmaker/rawmaker.go
Comment thread internal/image/imagedisc/loopdev.go
Comment thread internal/image/imagedisc/loopdev.go
Comment thread internal/image/imagedisc/loopdev.go Outdated
Comment thread internal/image/imagedisc/loopdev.go Outdated
rpandya28 and others added 3 commits May 27, 2026 10:59
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Replace the pattern of closing sigChan (the os.Signal channel) to signal
goroutine completion with a dedicated done channel and a select statement.

Previously the defer called close(sigChan) which would unblock the
goroutine's channel receive — relying on the ok check to distinguish a
real signal from normal completion. This conflates two concerns onto one
channel and is confusing.

With the new approach:
- sigChan is only ever written by the OS signal runtime (never closed)
- done is a separate struct channel closed by defer on return
- select{} in the goroutine makes the two paths explicit:
    case sig := <-sigChan  → real interrupt, cleanup + os.Exit(1)
    case <-done            → normal/error return, defer handles cleanup
// Goroutine waits for either an OS signal or normal completion.
// Using select + a dedicated done channel avoids closing sigChan and
// makes the two paths explicit: real interrupt vs. function returned.
go func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goroutine calls os.Exit(1), the entire process terminates immediately. os.Exit does not run any deferred functions, not the defer in BuildRawImage at rawmaker.go:145, not defers in the provider, not defers in main(). So chroot mounts, temp directories, and any other resources managed higher in the stack are leaked.

The solution is to not call os.Exit from library code and instead move signal handling up to the cmd level where it can orchestrate full cleanup.

// Goroutine waits for either an OS signal or normal completion.
// Using select + a dedicated done channel avoids closing sigChan and
// makes the two paths explicit: real interrupt vs. function returned.
go func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition: imagine a Ctrl+C arrives while the build is finishing normally. The signal gets queued in sigChan. Then the defer runs, it calls signal.Stop and close(done). At this point the goroutine's select sees two ready cases: the queued signal in sigChan AND the closed done channel. Go picks one at random. If it picks the signal case, both the goroutine and the defer end up calling LoopSetupDelete on the same loop device at the same time. Wrap the cleanup in a sync.Once so only one of them actually runs it.

Comment thread internal/image/imagedisc/loopdev.go Outdated
switch v := data.(type) {
case map[string]interface{}:
// Check if this entry has a mountpoint
if mountPoint, ok := v["mountpoint"].(string); ok && mountPoint != "" && mountPoint != "null" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After json.Unmarshal, a JSON null becomes Go nil, not the string "null". The .(string) type assertion already filters out nil values, so this comparison mountPoint != "null" is dead code.

@arodage
Copy link
Copy Markdown
Contributor

arodage commented May 27, 2026

Need to add unit test for new functions. At minimum for findAndUnmount and findAndDisableSwap

rpandya28 and others added 2 commits June 2, 2026 09:58
- Remove os.Exit() from library code, replace with error propagation
- Add sync.Once to prevent race condition on cleanup
- Remove dead code (mountPoint != "null" check)
- Add shell escaping for mount points with strconv.Quote()
- Add 14 comprehensive unit tests for findAndUnmount and findAndDisableSwap
- Use context.WithCancel() instead of closing channels

Addresses review comments:
- Fixes issue where signal handler could bypass deferred cleanup
- Prevents simultaneous cleanup calls from goroutine and defer
- Ensures proper resource cleanup through defer chain on signal interrupt
- Validates all partition unmounting and swap disabling operations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants