Skip to content

fix: systemd ordering cycle in nvfd-fan-control.service + add retry strategy #12

Merged
ricky-chaoju merged 3 commits into
Infinirc:mainfrom
delwiv:fix/fan-control-systemd-cycle
Mar 17, 2026
Merged

fix: systemd ordering cycle in nvfd-fan-control.service + add retry strategy #12
ricky-chaoju merged 3 commits into
Infinirc:mainfrom
delwiv:fix/fan-control-systemd-cycle

Conversation

@delwiv

@delwiv delwiv commented Mar 16, 2026

Copy link
Copy Markdown
Contributor

Fix dependency cycle error on nvfd-fan-service.

Here is the error :

systemd[1]: multi-user.target: Found ordering cycle: nvfd-fan-control.service/start after nvfd.service/start after multi-user.target/start - after nvfd-fan-control.service
systemd[1]: multi-user.target: Job nvfd-fan-control.service/start deleted to break ordering cycle starting with multi-user.target/start

Add a systemd retry strategy within a 400s window, and scripts now waits up to 30s if nvfd service is not started.

Here is an example of a fan control service with no nvfd service :

mars 16 13:03:20 systemd[1]: Started NVIDIA Fan Control - Temperature-aware per-GPU mode switching.
mars 16 13:03:20 nvfd-fan-control.sh[8162]: [INFO] Waiting for nvfd service to be ready...
mars 16 13:03:49 systemd[1]: nvfd-fan-control.service: Main process exited, code=exited, status=1/FAILURE
mars 16 13:03:49 systemd[1]: nvfd-fan-control.service: Failed with result 'exit-code'.
mars 16 13:05:19 systemd[1]: nvfd-fan-control.service: Scheduled restart job, restart counter is at 1.
mars 16 13:05:19 systemd[1]: Started NVIDIA Fan Control - Temperature-aware per-GPU mode switching.
mars 16 13:05:19 nvfd-fan-control.sh[9168]: [INFO] Waiting for nvfd service to be ready...
mars 16 13:05:49 systemd[1]: nvfd-fan-control.service: Main process exited, code=exited, status=1/FAILURE
mars 16 13:05:49 systemd[1]: nvfd-fan-control.service: Failed with result 'exit-code'.
mars 16 13:07:19 systemd[1]: nvfd-fan-control.service: Scheduled restart job, restart counter is at 2.
mars 16 13:07:19 systemd[1]: Started NVIDIA Fan Control - Temperature-aware per-GPU mode switching.
mars 16 13:07:19 nvfd-fan-control.sh[9854]: [INFO] Waiting for nvfd service to be ready...
mars 16 13:07:48 systemd[1]: nvfd-fan-control.service: Main process exited, code=exited, status=1/FAILURE
mars 16 13:07:48 systemd[1]: nvfd-fan-control.service: Failed with result 'exit-code'.
mars 16 13:09:18 systemd[1]: nvfd-fan-control.service: Scheduled restart job, restart counter is at 3.
mars 16 13:09:18 systemd[1]: nvfd-fan-control.service: Start request repeated too quickly.
mars 16 13:09:18 systemd[1]: nvfd-fan-control.service: Failed with result 'exit-code'.
mars 16 13:09:18 systemd[1]: Failed to start NVIDIA Fan Control - Temperature-aware per-GPU mode switching.


Remove After= and Wants= directives to avoid ordering cycles caused by
nvfd.service's After=multi-user.target.

Place StartLimitBurst/StartLimitIntervalSec in [Unit] section to rate
limit restarts: max 3 restarts in 400 seconds, with 90s between
attempts.

Changes:
- Removed After=nvfd.service, Wants=nvfd.service (causes cycles)
- Added StartLimitBurst=3, StartLimitIntervalSec=400 in [Unit] section
- RestartSec=90 (to fit within rate limit window)
- Script waits up to 30s for nvfd.service with 1s polling

This ensures:
- No systemd dependency cycles on start or stop
- Graceful waiting for nvfd without aggressive restart storms
- Rate-limited retries if nvfd fails to start
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the stability and reliability of the NVIDIA fan control service by addressing a critical systemd dependency cycle. It reconfigures the service's startup behavior to prevent deadlocks and introduces a more resilient retry mechanism. Furthermore, the fan control script now proactively waits for its essential nvfd.service dependency to be fully operational, ensuring proper initialization and reducing startup failures.

Highlights

  • Systemd Ordering Cycle Fix: Resolved a systemd ordering cycle error involving nvfd-fan-control.service and multi-user.target by removing explicit After and Wants dependencies on nvfd.service from the unit file.
  • Enhanced Service Resilience: Implemented a robust retry strategy for nvfd-fan-control.service within systemd, allowing up to 3 restarts within a 400-second interval on failure, with a longer delay between retries.
  • Dependency Readiness Check: Added a pre-execution check in the nvfd-fan-control.sh script that waits up to 30 seconds for the nvfd.service to become active, ensuring its dependency is ready before proceeding.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • utils/nvfd-fan-control.service
    • Removed After=nvfd.service and Wants=nvfd.service directives to break the systemd ordering cycle.
    • Added StartLimitIntervalSec=400 and StartLimitBurst=3 to define a retry strategy for service failures.
    • Changed Restart policy from always to on-failure.
    • Increased RestartSec from 5 seconds to 90 seconds to allow more time between retries.
  • utils/nvfd-fan-control.sh
    • Introduced a loop that waits for up to 30 seconds for the nvfd.service to become active.
    • Added an echo statement for informational logging about waiting for the nvfd service.
    • Modified the script to exit with status 1 if nvfd.service is not active after the 30-second wait.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The changes effectively address the systemd ordering cycle by removing direct service dependencies and implementing a robust retry mechanism. The nvfd-fan-control.sh script now explicitly waits for the nvfd.service to be ready, preventing issues if nvfd starts slowly. The systemd unit file is updated to use on-failure restarts with a longer delay and rate limiting, which improves the service's resilience.

Comment thread utils/nvfd-fan-control.sh Outdated
delwiv and others added 2 commits March 16, 2026 13:38
Add error log if `nvfd.service` is not ready after 30s

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ricky-chaoju

Copy link
Copy Markdown
Contributor

LGTM! Thanks for the fix.

@ricky-chaoju ricky-chaoju merged commit 2295c99 into Infinirc:main Mar 17, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants