Skip to content

feat: return non-zero exit code on activation errors#378

Open
jfroche wants to merge 1 commit into
mainfrom
fix/final-exit-code
Open

feat: return non-zero exit code on activation errors#378
jfroche wants to merge 1 commit into
mainfrom
fix/final-exit-code

Conversation

@jfroche
Copy link
Copy Markdown
Member

@jfroche jfroche commented Feb 27, 2026

Use lazy_errors to accumulate errors during activation and return a non-zero exit code if any errors were encountered, while still applying as many changes as possible.

We also disable the nix module by default to avoid issues with existing nix configuration file. It is possible to enable it again by setting nix.enable to true and
it will replace the existing /etc/nix/nix.conf file with the one generated by nix module.

@jfroche jfroche force-pushed the fix/final-exit-code branch 2 times, most recently from c17ab02 to ea4f13e Compare March 4, 2026 15:17
@picnoir picnoir self-assigned this Mar 5, 2026
@jfroche jfroche force-pushed the fix/final-exit-code branch from ea4f13e to d189d6f Compare March 6, 2026 13:29
@jfroche jfroche force-pushed the fix/final-exit-code branch 2 times, most recently from f94f70e to 5f099d0 Compare March 18, 2026 08:31
Copy link
Copy Markdown
Contributor

@picnoir picnoir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a massive fan of the activation results split.

See above another suggestion.

I started applying the suggestion to the whole control flow there: 59e78f6

What do you think about that?

Comment thread crates/system-manager-engine/src/activate.rs Outdated
@jfroche jfroche force-pushed the fix/final-exit-code branch from 5f099d0 to 6be53d9 Compare March 24, 2026 10:21
@jfroche jfroche force-pushed the fix/final-exit-code branch 2 times, most recently from 1595d9a to c97f004 Compare April 10, 2026 22:14
Copy link
Copy Markdown
Contributor

@picnoir picnoir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review of activate.rs and etc_files.rs. LGTM for these two files. I'll do a second pass to review the rest.

Overall, this area do not spark joy ><

That being said, the issue is not coming from this PR but from the unclear semantics we already have towards partial system activation.

A partial activation currently results in a broken system that will loudly fail on reboot. We probably can do better, at least for the etc files linking. But that requires a bit of design planning.

Another issue: we're overriding the state with a partial state, losing track of the files we have backup in the past and which files have been managed by system-manager at some point.

In any case, I don't think this PR is meant to fix all that. I guess it's a nice addition and overalll makes sense.

log::error!("Error during activation: {e:?}");
}

// Only run daemon reload, userborn, tmpfiles, and services when etc files
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, the machine is in a broken state and the services will likely break on reboot.

We should probably log a scary message here for now.

Long term, we probably want to make sure the files can be all copied before trying to write them on the disk to prevent this situation from happening.

Alternatively, we could backup the previous files during the activation and roll back everything if something goes wrong.

Comment thread crates/system-manager-engine/src/activate.rs
log::error!("Error during activation: {e:?}");
}

// Only register services when etc files were fully applied, preserving
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. This will break upon reboot. Not sure what the right semantic would be.

We probably need to create an issue and seriously think about the overall activation success/failure semantics and how to mitigate a failure.

@jfroche jfroche force-pushed the fix/final-exit-code branch from c97f004 to 2a812b2 Compare April 14, 2026 09:00
Use lazy_errors to accumulate errors during activation and return a
non-zero exit code if any errors were encountered, while still applying
as many changes as possible.
@jfroche jfroche force-pushed the fix/final-exit-code branch from 2a812b2 to 366f081 Compare April 14, 2026 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants