Skip to content
This repository was archived by the owner on Jan 23, 2026. It is now read-only.

handle signals properly#589

Merged
mangelajo merged 2 commits intojumpstarter-dev:mainfrom
michalskrivanek:signals
Sep 2, 2025
Merged

handle signals properly#589
mangelajo merged 2 commits intojumpstarter-dev:mainfrom
michalskrivanek:signals

Conversation

@michalskrivanek
Copy link
Copy Markdown
Contributor

@michalskrivanek michalskrivanek commented Aug 13, 2025

Properly handle termination signals to exit cleanly. Signals are forwarded to child, and its exit code is propagated to parent. SIGHUP restarts the child while keeping the parent.

Summary by CodeRabbit

  • New Features

    • Supervised, restartable exporter process with signal-forwarding and graceful shutdown.
    • Explicit exporter lifecycle with cancellation-shielded teardown.
    • Exporter stop request API supporting wait-for-lease-exit.
    • Asynchronous channel creation for improved startup sequencing.
  • Changes

    • Configuration list includes an apiVersion field (defaults to jumpstarter.dev/v1alpha1).
  • Bug Fixes

    • Improved error logging during device shutdown with driver context and guaranteed handler cleanup.

@netlify
Copy link
Copy Markdown

netlify Bot commented Aug 13, 2025

Deploy Preview for jumpstarter-docs ready!

Name Link
🔨 Latest commit f4754db
🔍 Latest deploy log https://app.netlify.com/projects/jumpstarter-docs/deploys/68aef31b40515c000846ebc3
😎 Deploy Preview https://deploy-preview-589--jumpstarter-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Aug 13, 2025

Walkthrough

Implements a fork-based supervisor in the CLI to run the exporter in a forked child with signal forwarding and restart semantics; adds an async exporter lifecycle via create_exporter with cancellation‑shielded teardown; makes channel_factory awaitable, adds Exporter.stop(), and hardens session close logging and handler cleanup.

Changes

Cohort / File(s) Summary
CLI supervisor refactor
packages/jumpstarter-cli/jumpstarter_cli/run.py
Replace single-process run with parent/child fork model. Parent forwards signals and restarts child; child runs exporter inside anyio task groups and signal receivers. Adds structured exit codes, logging, and restart semantics.
Exporter lifecycle + API changes
packages/jumpstarter/jumpstarter/exporter/exporter.py
Change channel_factory to return Awaitable[grpc.aio.Channel]; add internal lifecycle flags, _tg taskgroup tracking, and stop(self, wait_for_lease_exit=False); perform unregister on exit with timeout and shielded cleanup; adjust serve/start/stop sequencing.
Config-managed exporter lifecycle
packages/jumpstarter/jumpstarter/config/exporter.py
Add async def create_exporter(self) asynccontextmanager to build/enter/yield exporter and shield teardown with CancelScope; serve() delegates to the new context manager. Add api_version field and model_config to ExporterConfigListV1Alpha1.
Session cleanup hardening
packages/jumpstarter/jumpstarter/exporter/session.py
Wrap root_device.close() in try/except/finally, log driver-specific errors (exc_info=True) and always remove the logging handler in finally to guarantee cleanup.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as Parent (run.py)
  participant Child as Forked Child
  participant Exporter as Exporter.serve()

  User->>CLI: run(config)
  CLI->>Child: fork()
  rect rgb(235,245,255)
    note right of Child: Child process
    Child->>Exporter: anyio TG + signal receiver → run exporter.serve()
    alt SIGINT/SIGTERM/SIGHUP/SIGQUIT
      Child->>Exporter: cancel TG / call stop()
      Exporter-->>Child: graceful exit (0 or restart)
    else normal completion or error
      Child-->>CLI: exit code (non-zero => propagate)
    end
  end
  rect rgb(245,235,245)
    note right of CLI: Parent supervision loop
    CLI-->>Child: forward signals
    CLI->>CLI: waitpid()
    alt child exit == 0 (restart)
      CLI->>Child: fork again
    else non-zero / fatal signal
      CLI-->>User: propagate exit status
    end
  end
Loading
sequenceDiagram
  autonumber
  participant Cfg as ExporterConfigV1Alpha1
  participant Exp as Exporter
  participant Ctrl as ControllerService (gRPC)

  Cfg->>Cfg: async with create_exporter()
  Cfg->>Exp: instantiate (await channel_factory)
  Cfg->>Exp: await __aenter__() then await serve()
  alt lease lost / stop requested / cancel
    Exp->>Exp: stop() cancels TG or marks stop_requested
  end
  Cfg->>Ctrl: Unregister(reason="Exporter shutdown") with timeout (shielded)
  Cfg->>Exp: await __aexit__()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • mangelajo

Poem

I hop, I fork, I guard the run,
Signals I pass, until the job's done.
I shield the goodbye, unregister with care,
Logs kept tidy — tidy as my hare.
Restart or rest, this bunny’s aware. 🥕🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)

1-1: Remove unused asyncio import

The asyncio import is only used in the child signal handler, which should be replaced with anyio's signal handling as suggested above.

Once the signal handling is updated to use anyio, remove the unused import:

-import asyncio
 import os
 import signal
 import sys
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e77e93 and 87dd447.

📒 Files selected for processing (1)
  • packages/jumpstarter-cli/jumpstarter_cli/run.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: NickCao
PR: jumpstarter-dev/jumpstarter#521
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:1-2
Timestamp: 2025-06-12T17:18:48.519Z
Learning: Exporters in this project are intended to run only on Unix systems, so Unix-specific APIs like `os.fork()` are acceptable and Windows compatibility is not required.
🧬 Code Graph Analysis (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (4)
packages/jumpstarter-cli-common/jumpstarter_cli_common/config.py (1)
  • opt_config (99-100)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (1)
  • handle_exceptions (32-46)
packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py (1)
  • signal_handler (9-22)
packages/jumpstarter/jumpstarter/exporter/exporter.py (1)
  • status (115-134)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: e2e
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.11)

Comment thread packages/jumpstarter-cli/jumpstarter_cli/run.py Outdated
Comment thread packages/jumpstarter-cli/jumpstarter_cli/run.py Outdated
Comment on lines 51 to 105
def _serve_with_exc_handling(exporter):
"""Fork-based approach with improved signal handling."""
pid = None

def parent_signal_handler(signum, _):
"""Handle signals: forward to child, restart child on SIGHUP, exit on others."""
if pid and pid > 0:
try:
os.kill(pid, signum)
if signum != signal.SIGHUP:
_, status = os.waitpid(pid, 0)
sys.exit(os.WEXITSTATUS(status) if os.WIFEXITED(status) else 1)
except ProcessLookupError:
pass
if signum != signal.SIGHUP:
sys.exit(0)

# Set up signal handlers
for sig in (signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT):
signal.signal(sig, parent_signal_handler)

while True:
result = Value("i", 0)
pid = os.fork()

if pid > 0:
os.waitpid(pid, 0)
if result.value != 0:
return result.value
# Parent process
if (exit_code := _handle_parent(pid)) is not None:
return exit_code
else:
try:
anyio.run(exporter.serve)
except* Exception as excgroup:
for exc in leaf_exceptions(excgroup):
click.echo(
f"Exception while serving on the exporter: {type(exc).__name__}: {exc}",
err=True,
)
result.value = 1
return
_handle_child(exporter)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider handling SIGCHLD to avoid zombie processes

If the child process exits unexpectedly (e.g., crashes), it will become a zombie until the parent's next signal or explicit wait. Consider adding a SIGCHLD handler to reap children promptly.

Add SIGCHLD handling to promptly reap child processes:

 def _serve_with_exc_handling(exporter):
     """Fork-based approach with improved signal handling."""
     pid = None
+    child_exited = False
+
+    def sigchld_handler(signum, _):
+        """Handle SIGCHLD to reap zombie processes."""
+        nonlocal child_exited
+        try:
+            os.waitpid(-1, os.WNOHANG)
+            child_exited = True
+        except ChildProcessError:
+            pass

     def parent_signal_handler(signum, _):
         """Handle signals: forward to child, restart child on SIGHUP, exit on others."""
         if pid and pid > 0:
             try:
                 os.kill(pid, signum)
             except ProcessLookupError:
                 pass
         if signum != signal.SIGHUP:
             pass

     # Set up signal handlers
     for sig in (signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT):
         signal.signal(sig, parent_signal_handler)
+    signal.signal(signal.SIGCHLD, sigchld_handler)

     while True:
+        child_exited = False
         pid = os.fork()

         if pid > 0:
             # Parent process
-            if (exit_code := _handle_parent(pid)) is not None:
-                return exit_code
+            while not child_exited:
+                signal.pause()  # Wait for signals
+            _, status = os.waitpid(pid, os.WNOHANG)
+            if os.WIFEXITED(status):
+                if (exit_code := os.WEXITSTATUS(status)) != 0:
+                    return exit_code
+            elif os.WIFSIGNALED(status):
+                sig = os.WTERMSIG(status)
+                if sig != signal.SIGHUP:
+                    return 128 + sig
+            # Continue loop for SIGHUP or normal exit
         else:
             _handle_child(exporter)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _serve_with_exc_handling(exporter):
"""Fork-based approach with improved signal handling."""
pid = None
def parent_signal_handler(signum, _):
"""Handle signals: forward to child, restart child on SIGHUP, exit on others."""
if pid and pid > 0:
try:
os.kill(pid, signum)
if signum != signal.SIGHUP:
_, status = os.waitpid(pid, 0)
sys.exit(os.WEXITSTATUS(status) if os.WIFEXITED(status) else 1)
except ProcessLookupError:
pass
if signum != signal.SIGHUP:
sys.exit(0)
# Set up signal handlers
for sig in (signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT):
signal.signal(sig, parent_signal_handler)
while True:
result = Value("i", 0)
pid = os.fork()
if pid > 0:
os.waitpid(pid, 0)
if result.value != 0:
return result.value
# Parent process
if (exit_code := _handle_parent(pid)) is not None:
return exit_code
else:
try:
anyio.run(exporter.serve)
except* Exception as excgroup:
for exc in leaf_exceptions(excgroup):
click.echo(
f"Exception while serving on the exporter: {type(exc).__name__}: {exc}",
err=True,
)
result.value = 1
return
_handle_child(exporter)
def _serve_with_exc_handling(exporter):
"""Fork-based approach with improved signal handling."""
pid = None
child_exited = False
def sigchld_handler(signum, _):
"""Handle SIGCHLD to reap zombie processes."""
nonlocal child_exited
try:
os.waitpid(-1, os.WNOHANG)
child_exited = True
except ChildProcessError:
pass
def parent_signal_handler(signum, _):
"""Handle signals: forward to child, restart child on SIGHUP, exit on others."""
if pid and pid > 0:
try:
os.kill(pid, signum)
if signum != signal.SIGHUP:
_, status = os.waitpid(pid, 0)
sys.exit(os.WEXITSTATUS(status) if os.WIFEXITED(status) else 1)
except ProcessLookupError:
pass
if signum != signal.SIGHUP:
sys.exit(0)
# Set up signal handlers
for sig in (signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT):
signal.signal(sig, parent_signal_handler)
signal.signal(signal.SIGCHLD, sigchld_handler)
while True:
child_exited = False
pid = os.fork()
if pid > 0:
# Parent process
while not child_exited:
signal.pause() # Wait for signals
_, status = os.waitpid(pid, os.WNOHANG)
if os.WIFEXITED(status):
if (exit_code := os.WEXITSTATUS(status)) != 0:
return exit_code
elif os.WIFSIGNALED(status):
sig = os.WTERMSIG(status)
if sig != signal.SIGHUP:
return 128 + sig
# Continue loop for SIGHUP or normal exit
else:
_handle_child(exporter)
🤖 Prompt for AI Agents
In packages/jumpstarter-cli/jumpstarter_cli/run.py around lines 51 to 81, the
parent lacks a SIGCHLD handler so exited child processes can become zombies; add
a SIGCHLD handler that non-blockingly reaps children (loop calling
os.waitpid(-1, os.WNOHANG) until it returns (0,0) or raises ChildProcessError),
swallow ProcessLookupError, and install that handler alongside the existing
signal handlers (ensure it returns quickly and does not block the main loop or
conflict with parent_signal_handler).

Comment thread packages/jumpstarter-cli/jumpstarter_cli/run.py Outdated
Comment thread packages/jumpstarter-cli/jumpstarter_cli/run.py Outdated
@michalskrivanek
Copy link
Copy Markdown
Contributor Author

there are several problems with current signal handling. it's not apparent when running interactively and just Ctrl+C it, but when running as a service or from within a container it doesn't do cleanup and it's generally not handled correctly.

for forked process the usual way is to send SIGTERM to the process group, both parent and child get the signal and terminate immediately (well, depending on timing parent may attempt to rerun child).
It's even worse when sending signals just to parent - then child becomes orphaned and never terminates

in order to gracefully shut down the parent needs to handle wait on child exit (and has to handle its signals) and propagate them to child in case only the parent gets them. The child needs to react to signals as well, cancelling the async tasks that in turn triggers exporter's aexit unregistering from controller first. And with that the child can also communicate back its exit code cleanly, without a need for shared "result" var.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
packages/jumpstarter-cli/jumpstarter_cli/run.py (2)

68-77: Ensure child never continues executing parent code; reset handlers in child

Two points:

  • Add a defensive exit after _handle_child(exporter) so the child cannot “fall through” in pathological cases.
  • The child inherits the parent’s Python signal handlers; reset them to defaults to avoid interference with AnyIO’s signal receiver in the child.

Apply this diff:

-        else:
-            _handle_child(exporter)
+        else:
+            # Child process: reset parent's handlers to defaults
+            for _sig in (signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT):
+                try:
+                    signal.signal(_sig, signal.SIG_DFL)
+                except Exception:
+                    pass
+            _handle_child(exporter)
+            os._exit(1)  # Should be unreachable; safeguard to prevent fallthrough

40-49: Handle EINTR in waitpid and interpret SIGHUP-terminated child as restart

Signals delivered to the parent may interrupt waitpid() with EINTR, causing an exception and aborting the loop. Also, if the child is terminated by SIGHUP before it can handle it, treat that as a restart instead of exiting.

Apply this diff:

-def _handle_parent(pid):
+def _handle_parent(pid):
     """Handle parent process waiting for child."""
-    _, status = os.waitpid(pid, 0)
+    # waitpid() can be interrupted by signals; retry on EINTR
+    while True:
+        try:
+            _, status = os.waitpid(pid, 0)
+            break
+        except InterruptedError:
+            continue
     if os.WIFEXITED(status):
         exit_code = os.WEXITSTATUS(status)
         if exit_code != 0:
             return exit_code
         # Child exited normally (code 0) - continue loop for SIGHUP restart
-    else:
-        return 128 + (os.WTERMSIG(status) if os.WIFSIGNALED(status) else 0)
+    elif os.WIFSIGNALED(status):
+        sig = os.WTERMSIG(status)
+        # If the child died directly from SIGHUP (missed in-child handling), restart
+        if sig == signal.SIGHUP:
+            return None
+        return 128 + sig
+    else:
+        # Stopped/continued are unexpected here with blocking wait
+        return 1
🧹 Nitpick comments (2)
packages/jumpstarter-cli/jumpstarter_cli/run.py (2)

56-66: Option: forward to the child’s process group to catch grandchildren

If the exporter spawns subprocesses, signaling only the child PID may leave grandchildren running. Consider making the child a new session leader and forwarding signals to its process group.

Apply this diff to forward to the process group with a safe fallback:

 def parent_signal_handler(signum, _):
     """Forward signals to child process."""
     if pid and pid > 0:
         try:
-            os.kill(pid, signum)
+            # Prefer signaling the child's process group, fallback to PID
+            try:
+                os.killpg(pid, signum)
+            except Exception:
+                os.kill(pid, signum)
         except ProcessLookupError:
             pass
@@
-    for sig in (signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT):
+    for sig in (signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT):
         signal.signal(sig, parent_signal_handler)

And, in the child branch (only once, right after fork), make the child a session leader so it owns its process group:

-        else:
-            # Child process: reset parent's handlers to defaults
+        else:
+            # Child process: reset parent's handlers to defaults and become a new session leader
             for _sig in (signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT):
                 try:
                     signal.signal(_sig, signal.SIG_DFL)
                 except Exception:
                     pass
+            try:
+                os.setsid()
+            except Exception:
+                pass
             _handle_child(exporter)
             os._exit(1)  # Should be unreachable; safeguard to prevent fallthrough

Note: This is POSIX-specific; the fallback to os.kill(pid, signum) keeps behavior correct on non-POSIX platforms.


12-37: Child-side signal handling and graceful shutdown: overall solid

  • Uses AnyIO’s open_signal_receiver with a proper synchronous context manager — correct.
  • Cancels the task group on first signal; maps exit codes appropriately, including special-casing SIGHUP to 0 for restarts.

Minor nit: consider logging which signal was received to aid diagnostics.

Apply this minimal addition:

         except get_cancelled_exc_class():
-            if received_signal == signal.SIGHUP:
+            if received_signal == signal.SIGHUP:
+                click.echo("SIGHUP received, restarting child", err=True)
                 return 0
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87dd447 and de08520.

📒 Files selected for processing (1)
  • packages/jumpstarter-cli/jumpstarter_cli/run.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO 4.6.2+, `open_signal_receiver()` is a regular synchronous context manager that should be used with a plain `with` statement, not `async with`. However, it returns an asynchronous iterator that must be consumed with `async for`. The correct usage pattern is: `with open_signal_receiver(...) as signals: async for signum in signals:`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO, `open_signal_receiver()` returns a synchronous context manager (used with regular `with`), not an asynchronous one. While the context manager is synchronous, it yields an asynchronous iterator that should be consumed with `async for`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code Graph Analysis (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (3)
packages/jumpstarter-cli-common/jumpstarter_cli_common/config.py (1)
  • opt_config (99-100)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (1)
  • handle_exceptions (32-46)
packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py (1)
  • signal_handler (9-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: e2e
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
🔇 Additional comments (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)

17-21: No change required — repository requires Python >= 3.11, so await anext(signals) is supported

pyproject metadata and CI show the project targets Python 3.11+ (pyproject requires-python = ">=3.11" in multiple packages and .github/workflows runs 3.11–3.13), while anext() was introduced in Python 3.10. The portability concern in the original comment is therefore not applicable; switching to an async for loop is optional (readability), not required.

Files to note:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py — current use: received_signal = await anext(signals)
  • packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py — also uses open_signal_receiver

Optional readability diff (not required):

-        async def signal_handler(scope):
-            nonlocal received_signal
-            with open_signal_receiver(signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT) as signals:
-                received_signal = await anext(signals)
-                scope.cancel()
+        async def signal_handler(scope):
+            nonlocal received_signal
+            with open_signal_receiver(signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT) as signals:
+                async for signum in signals:
+                    received_signal = signum
+                    scope.cancel()
+                    break

Comment thread packages/jumpstarter-cli/jumpstarter_cli/run.py
@michalskrivanek michalskrivanek force-pushed the signals branch 2 times, most recently from bae1231 to 2755970 Compare August 14, 2025 11:45
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
packages/jumpstarter-cli/jumpstarter_cli/run.py (4)

12-41: LGTM! Clean signal handling implementation using anyio.

The child process signal handling using anyio's open_signal_receiver is well-implemented and follows the correct pattern (synchronous context manager with async iteration). The signal forwarding to cancel the task group and proper exit code handling align with Unix conventions.


15-15: Consider using None to distinguish "no signal" from signal 0.

Using 0 as the default for received_signal could be ambiguous since it's also a valid signal number (though rarely used). When cancellation occurs without a signal, the current code returns 128 + 0 = 128, which might be misleading.

Consider using None to represent "no signal received":

-        received_signal = 0
+        received_signal = None

         async def signal_handler(cancel_scope):
             nonlocal received_signal
             with open_signal_receiver(signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT) as signals:
                 received_signal = await anext(signals)
                 # Cancel the entire task group including exporter's tasks
                 cancel_scope.cancel()

         try:
             async with create_task_group() as tg:
                 tg.start_soon(signal_handler, tg.cancel_scope)
                 # Pass our task group to exporter.serve() so it adds its tasks directly to our group
                 await exporter.serve(task_group=tg)
             # If signal was received but task group completed normally, use signal-based exit code
-            return 128 + received_signal if received_signal else 128
+            return 128 + received_signal if received_signal is not None else 0
         except get_cancelled_exc_class():
-            return 0 if received_signal == signal.SIGHUP else 128 + received_signal  # restart child on SIGHUP
+            if received_signal == signal.SIGHUP:
+                return 0  # restart child on SIGHUP
+            elif received_signal is not None:
+                return 128 + received_signal
+            else:
+                return 0  # graceful shutdown without signal

Also applies to: 30-32


79-80: Add defensive exit after _handle_child.

While _handle_child calls sys.exit(), adding a defensive exit ensures the child never continues if something goes wrong.

         else:
             _handle_child(exporter)
+            os._exit(1)  # Defensive: should never reach here

57-67: Consider handling SIGCHLD for prompt zombie reaping.

If the child exits unexpectedly (e.g., crashes) between signals, it becomes a zombie until the parent receives another signal or explicitly waits. While the current design eventually reaps the child, prompt reaping would be cleaner.

This is a minor edge case that doesn't affect correctness but could improve resource management in long-running scenarios. Consider adding SIGCHLD handling if you observe zombie processes during testing.

🧹 Nitpick comments (2)
packages/jumpstarter/jumpstarter/exporter/exporter.py (1)

119-164: Consider documenting the task group ownership contract.

The change from internal task group management to external task group orchestration is well-implemented. However, the contract between the exporter and the caller regarding task group lifecycle and cancellation responsibilities should be documented.

Consider adding a docstring to clarify the contract:

-    async def serve(self, task_group):  # noqa: C901
+    async def serve(self, task_group):  # noqa: C901
+        """
+        Serve the exporter using the provided task group.
+        
+        Args:
+            task_group: An external anyio TaskGroup to which exporter tasks will be added.
+                       The caller is responsible for the task group's lifecycle and cancellation.
+        
+        Note:
+            When lease changes occur, this method may break out of its loop. The caller
+            should handle appropriate cleanup and restart logic.
+        """
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)

43-67: Signal handler setup timing could miss early signals.

The signal handlers are set up after the fork (line 54-55), creating a small window where signals sent immediately after fork but before handler installation might be lost. While rare, this could affect automated testing or rapid signal scenarios.

Consider setting up handlers before fork and resetting them in the child:

 def _handle_parent(pid):
     """Handle parent process waiting for child and signal forwarding."""
+    # ... handler definition remains the same ...
+    _, status = os.waitpid(pid, 0)
+    # ... rest remains the same ...

 def _serve_with_exc_handling(exporter):
     """Fork-based approach with improved signal handling."""
+    def parent_signal_handler(signum, _):
+        """Placeholder handler before fork."""
+        pass
+    
+    # Set up handlers before fork to avoid race
+    original_handlers = {}
+    for sig in (signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT):
+        original_handlers[sig] = signal.signal(sig, parent_signal_handler)
+    
     while True:
         pid = os.fork()

         if pid > 0:
             # Parent process
+            # Update handler to forward to actual child PID
+            def forward_to_child(signum, _):
+                try:
+                    os.kill(pid, signum)
+                except ProcessLookupError:
+                    pass
+            
+            for sig in (signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT):
+                signal.signal(sig, forward_to_child)
+            
             if (exit_code := _handle_parent(pid)) is not None:
+                # Restore original handlers before exit
+                for sig, handler in original_handlers.items():
+                    signal.signal(sig, handler)
                 return exit_code
         else:
+            # Child: restore default handlers
+            for sig in (signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT):
+                signal.signal(sig, signal.SIG_DFL)
             _handle_child(exporter)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bae1231 and 2755970.

📒 Files selected for processing (4)
  • packages/jumpstarter-cli/jumpstarter_cli/run.py (1 hunks)
  • packages/jumpstarter/jumpstarter/config/exporter.py (2 hunks)
  • packages/jumpstarter/jumpstarter/exporter/exporter.py (4 hunks)
  • packages/jumpstarter/jumpstarter/exporter/session.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO 4.6.2+, `open_signal_receiver()` is a regular synchronous context manager that should be used with a plain `with` statement, not `async with`. However, it returns an asynchronous iterator that must be consumed with `async for`. The correct usage pattern is: `with open_signal_receiver(...) as signals: async for signum in signals:`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO, `open_signal_receiver()` returns a synchronous context manager (used with regular `with`), not an asynchronous one. While the context manager is synchronous, it yields an asynchronous iterator that should be consumed with `async for`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code Graph Analysis (4)
packages/jumpstarter/jumpstarter/exporter/session.py (1)
packages/jumpstarter-driver-composite/jumpstarter_driver_composite/driver.py (1)
  • report (37-38)
packages/jumpstarter/jumpstarter/exporter/exporter.py (3)
packages/jumpstarter-protocol/jumpstarter_protocol/jumpstarter/v1/jumpstarter_pb2_grpc.py (1)
  • ControllerServiceStub (9-68)
packages/jumpstarter/jumpstarter/config/exporter.py (2)
  • channel_factory (165-173)
  • serve (159-189)
packages/jumpstarter/conftest.py (1)
  • Unregister (59-60)
packages/jumpstarter/jumpstarter/config/exporter.py (1)
packages/jumpstarter/jumpstarter/exporter/exporter.py (2)
  • serve (119-163)
  • Exporter (24-163)
packages/jumpstarter-cli/jumpstarter_cli/run.py (5)
packages/jumpstarter-cli-common/jumpstarter_cli_common/config.py (1)
  • opt_config (99-100)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (1)
  • handle_exceptions (32-46)
packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py (1)
  • signal_handler (9-22)
packages/jumpstarter/jumpstarter/exporter/exporter.py (2)
  • serve (119-163)
  • status (126-145)
packages/jumpstarter/jumpstarter/config/exporter.py (1)
  • serve (159-189)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: e2e
  • GitHub Check: Pages changed - jumpstarter-docs
🔇 Additional comments (3)
packages/jumpstarter/jumpstarter/exporter/session.py (1)

46-56: LGTM! Robust error handling during cleanup.

The enhanced error handling in __exit__ ensures that driver close failures are properly logged with descriptive information, and the logging handler is always removed via the finally block. This prevents resource leaks and improves debugging.

packages/jumpstarter/jumpstarter/exporter/exporter.py (1)

32-52: LGTM! Comprehensive cleanup with timeout protection.

The enhanced __aexit__ implementation with a 10-second timeout, nested error handling, and non-masking behavior ensures robust cleanup even when the controller is unresponsive. The descriptive unregister reason improves debugging.

packages/jumpstarter/jumpstarter/config/exporter.py (1)

159-190: LGTM! Well-structured lifecycle management with shielded cleanup.

The explicit lifecycle management with __aenter__ and __aexit__, combined with the shielded cleanup using CancelScope(shield=True), ensures proper resource cleanup even during abrupt cancellation. The try/finally pattern guarantees cleanup execution.

Comment thread packages/jumpstarter/jumpstarter/exporter/exporter.py Outdated
@michalskrivanek michalskrivanek force-pushed the signals branch 2 times, most recently from ace8994 to b31e578 Compare August 15, 2025 16:38
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)

90-93: Propagate exit code to the shell (Click ignores return values)

Returning an int from a Click command doesn’t set process exit status. Use sys.exit(...) to ensure the CLI exits with the intended code.

 def run(config):
     """Run an exporter locally."""
-    return _serve_with_exc_handling(config)
+    sys.exit(_serve_with_exc_handling(config))
♻️ Duplicate comments (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)

59-72: Treat SIGHUP-terminated child (WIFSIGNALED) as restart to avoid race at early startup

If the child receives SIGHUP before installing its handler, it may terminate due to the default action, and the parent currently exits instead of restarting. Handle WIFSIGNALED(status) == SIGHUP as a restart as well.

-    else:
-        # Child killed by unhandled signal - terminate
-        child_exit_code = 128 + (os.WTERMSIG(status) if os.WIFSIGNALED(status) else 0)
-        click.echo(f"Child killed by unhandled signal: {child_exit_code}", err=True)
-        return child_exit_code
+    else:
+        # Child terminated by a signal
+        if os.WIFSIGNALED(status):
+            sig = os.WTERMSIG(status)
+            if sig == signal.SIGHUP:
+                # Treat SIGHUP as a restart trigger even if the child didn't trap it
+                return None
+            child_exit_code = 128 + sig
+            click.echo(f"Child killed by unhandled signal: {child_exit_code}", err=True)
+            return child_exit_code
+        # Unexpected termination state
+        click.echo("Child terminated unexpectedly", err=True)
+        return 1
🧹 Nitpick comments (4)
packages/jumpstarter/jumpstarter/exporter/session.py (1)

47-55: Harden error logging in exit to avoid cascading failures

If root_device.report() raises inside the except block, the original close() error would be masked. Wrap the report retrieval in its own try/except and fall back to the class name.

         try:
             self.root_device.close()
         except Exception as e:
-            # Get driver name from report for more descriptive logging
-            report = self.root_device.report()
-            driver_name = report.labels.get('jumpstarter.dev/name', self.root_device.__class__.__name__)
-            logger.error("Error closing driver %s: %s", driver_name, e, exc_info=True)
+            # Best-effort: fetch a descriptive driver name for logging
+            try:
+                report = self.root_device.report()
+                driver_name = report.labels.get("jumpstarter.dev/name", self.root_device.__class__.__name__)
+            except Exception:
+                driver_name = self.root_device.__class__.__name__
+            logger.error("Error closing driver %s: %s", driver_name, e, exc_info=True)
         finally:
             logging.getLogger().removeHandler(self._logging_handler)
packages/jumpstarter/jumpstarter/config/exporter.py (1)

159-189: Call aexit only if aenter completed successfully

If await exporter.__aenter__() raises, calling __aexit__ may perform invalid cleanup. Track an entered flag to ensure correctness. The shielding looks good.

-        exporter = None
+        exporter = None
+        entered = False
         try:
             exporter = Exporter(
                 channel_factory=channel_factory,
                 device_factory=ExporterConfigV1Alpha1DriverInstance(children=self.export).instantiate,
                 tls=self.tls,
                 grpc_options=self.grpcOptions,
             )
-            await exporter.__aenter__()
+            await exporter.__aenter__()
+            entered = True
             await exporter.serve(task_group)
         finally:
             # Shield all cleanup operations from abrupt cancellation for clean shutdown
-            if exporter:
+            if exporter and entered:
                 with CancelScope(shield=True):
                     await exporter.__aexit__(None, None, None)

Would you like me to add defensive logging around __aexit__ as well to avoid masking an original failure during cleanup?

packages/jumpstarter/jumpstarter/exporter/exporter.py (1)

159-176: Clarify log message to match the new fork-based lifecycle

Given the process will exit and the supervisor restarts it, the current message “killing existing connections” is misleading. Suggest updating the log to reflect the restart flow.

-                logger.info("Lease status changed, killing existing connections")
+                logger.info("Lease status changed; exiting to allow supervisor to restart")
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)

82-84: Optional: place child in its own process group and signal the group

If the child ever spawns descendants, sending signals to the process group ensures all are terminated/restarted consistently.

-        else:
-            _handle_child(exporter)
+        else:
+            # Optional: make the child its own process group leader so we can signal the whole group
+            try:
+                os.setpgid(0, 0)
+            except Exception:
+                pass
+            _handle_child(exporter)

And in parent_signal_handler, use os.killpg(pid, signum) instead of os.kill(pid, signum) if you adopt process groups.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ace8994 and b31e578.

📒 Files selected for processing (4)
  • packages/jumpstarter-cli/jumpstarter_cli/run.py (1 hunks)
  • packages/jumpstarter/jumpstarter/config/exporter.py (2 hunks)
  • packages/jumpstarter/jumpstarter/exporter/exporter.py (4 hunks)
  • packages/jumpstarter/jumpstarter/exporter/session.py (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Learnt from: NickCao
PR: jumpstarter-dev/jumpstarter#521
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:1-2
Timestamp: 2025-06-12T17:18:48.519Z
Learning: Exporters in this project are intended to run only on Unix systems, so Unix-specific APIs like `os.fork()` are acceptable and Windows compatibility is not required.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.

Applied to files:

  • packages/jumpstarter/jumpstarter/exporter/exporter.py
  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO 4.6.2+, `open_signal_receiver()` is a regular synchronous context manager that should be used with a plain `with` statement, not `async with`. However, it returns an asynchronous iterator that must be consumed with `async for`. The correct usage pattern is: `with open_signal_receiver(...) as signals: async for signum in signals:`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO, `open_signal_receiver()` returns a synchronous context manager (used with regular `with`), not an asynchronous one. While the context manager is synchronous, it yields an asynchronous iterator that should be consumed with `async for`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-08-14T13:20:51.953Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:15-15
Timestamp: 2025-08-14T13:20:51.953Z
Learning: Signal 0 is not a valid receivable signal in Unix systems. It can only be used as an argument to kill(pid, 0) to test if a process exists, but doesn't actually send a signal. Receivable signals are positive integers (e.g., SIGINT=2, SIGTERM=15, SIGHUP=1, SIGQUIT=3). Using 0 to represent "no signal received" in signal handling code is correct.

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code Graph Analysis (4)
packages/jumpstarter/jumpstarter/exporter/session.py (1)
packages/jumpstarter-driver-composite/jumpstarter_driver_composite/driver.py (1)
  • report (37-38)
packages/jumpstarter/jumpstarter/exporter/exporter.py (3)
packages/jumpstarter-protocol/jumpstarter_protocol/jumpstarter/v1/jumpstarter_pb2_grpc.py (1)
  • ControllerServiceStub (9-68)
packages/jumpstarter/jumpstarter/config/exporter.py (2)
  • channel_factory (165-173)
  • serve (159-189)
packages/jumpstarter/conftest.py (1)
  • Unregister (59-60)
packages/jumpstarter/jumpstarter/config/exporter.py (1)
packages/jumpstarter/jumpstarter/exporter/exporter.py (2)
  • serve (120-175)
  • Exporter (24-175)
packages/jumpstarter-cli/jumpstarter_cli/run.py (3)
packages/jumpstarter-cli-common/jumpstarter_cli_common/config.py (1)
  • opt_config (99-100)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (1)
  • handle_exceptions (32-46)
packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py (1)
  • signal_handler (9-22)
🔇 Additional comments (4)
packages/jumpstarter/jumpstarter/exporter/exporter.py (3)

7-7: Import update looks correct

Using connect_unix, create_memory_object_stream, and sleep from anyio aligns with the new external TaskGroup orchestration.


33-52: Robust unregistration on shutdown — good improvement

The 10s timeout and guarded exception logging in __aexit__ make cleanup safer without masking the original cause. LGTM.


120-131: Docstring clearly states the external TaskGroup contract

This clarifies ownership and is consistent with the fork-based supervisor. Good call.

packages/jumpstarter-cli/jumpstarter_cli/run.py (1)

20-31: Child: correct AnyIO signal handling and graceful cancellation

Using open_signal_receiver with a synchronous context manager and canceling the task group on first signal is correct. Good.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
packages/jumpstarter/jumpstarter/config/exporter.py (3)

175-189: Optional: use AsyncExitStack for idiomatic async context handling with shielded cleanup

AsyncExitStack lets you control exactly when aexit runs (inside a shield) without manually invoking dunder methods.

Alternative refactor:

-        exporter = None
-        try:
-            exporter = Exporter(
-                channel_factory=channel_factory,
-                device_factory=ExporterConfigV1Alpha1DriverInstance(children=self.export).instantiate,
-                tls=self.tls,
-                grpc_options=self.grpcOptions,
-            )
-            await exporter.__aenter__()
-            await exporter.serve(task_group)
-        finally:
-            # Shield all cleanup operations from abrupt cancellation for clean shutdown
-            if exporter:
-                with CancelScope(shield=True):
-                    await exporter.__aexit__(None, None, None)
+        from contextlib import AsyncExitStack
+        stack = AsyncExitStack()
+        try:
+            exporter = Exporter(
+                channel_factory=channel_factory,
+                device_factory=ExporterConfigV1Alpha1DriverInstance(children=self.export).instantiate,
+                tls=self.tls,
+                grpc_options=self.grpcOptions,
+            )
+            await stack.enter_async_context(exporter)
+            await exporter.serve(task_group)
+        finally:
+            # Shield all cleanup operations from abrupt cancellation for clean shutdown
+            with CancelScope(shield=True):
+                await stack.aclose()

159-163: Type-hint task_group for clarity

Annotating task_group improves readability and IDE assistance. AnyIO exposes the protocol in anyio.abc.TaskGroup.

Apply this diff:

-    async def serve(self, task_group):
+    async def serve(self, task_group: "TaskGroup"):
         # dynamic import to avoid circular imports
-        from anyio import CancelScope
+        from anyio import CancelScope
+        from anyio.abc import TaskGroup

166-168: Improve diagnostics when endpoint/token are missing

Include alias/path to quickly identify which exporter config failed.

-            if self.endpoint is None or self.token is None:
-                raise ConfigurationError("endpoint or token not set in exporter config")
+            if self.endpoint is None or self.token is None:
+                raise ConfigurationError(
+                    f"endpoint or token not set in exporter config {self.alias} (path={self.path})"
+                )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b31e578 and 115afe4.

📒 Files selected for processing (4)
  • packages/jumpstarter-cli/jumpstarter_cli/run.py (1 hunks)
  • packages/jumpstarter/jumpstarter/config/exporter.py (2 hunks)
  • packages/jumpstarter/jumpstarter/exporter/exporter.py (4 hunks)
  • packages/jumpstarter/jumpstarter/exporter/session.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/jumpstarter/jumpstarter/exporter/session.py
  • packages/jumpstarter-cli/jumpstarter_cli/run.py
  • packages/jumpstarter/jumpstarter/exporter/exporter.py
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Learnt from: NickCao
PR: jumpstarter-dev/jumpstarter#521
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:1-2
Timestamp: 2025-06-12T17:18:48.519Z
Learning: Exporters in this project are intended to run only on Unix systems, so Unix-specific APIs like `os.fork()` are acceptable and Windows compatibility is not required.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: e2e
🔇 Additional comments (2)
packages/jumpstarter/jumpstarter/config/exporter.py (2)

159-190: Good: cancellation-shielded cleanup and external TaskGroup integration

Passing an external task_group and shielding exporter cleanup from cancellation matches the PR’s signal-handling goals and the fork-based supervision model. This should help ensure child-side cleanup runs reliably before exit and restart.


159-190: Exporter.serve signature and call sites are aligned — no change required

Verified that Exporter defines async def serve(self, task_group) and existing call sites pass task_group (positional or named):

  • packages/jumpstarter/jumpstarter/exporter/exporter.py — async def serve(self, task_group) (≈line 120)
  • packages/jumpstarter/jumpstarter/config/exporter.py — await exporter.serve(task_group) (≈line 184)
  • packages/jumpstarter-cli/jumpstarter_cli/run.py — await exporter.serve(task_group=tg) (line 35)

No action required.

Comment thread packages/jumpstarter/jumpstarter/config/exporter.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
packages/jumpstarter/jumpstarter/config/exporter.py (1)

175-189: Call aexit only if aenter completed to avoid unsafe cleanup

If Exporter.__aenter__ raises, invoking __aexit__ is not guaranteed to be safe and can mask the original error. Track whether the context was successfully entered before calling __aexit__.

Apply this diff:

-        exporter = None
+        exporter = None
+        entered = False
         try:
             exporter = Exporter(
                 channel_factory=channel_factory,
                 device_factory=ExporterConfigV1Alpha1DriverInstance(children=self.export).instantiate,
                 tls=self.tls,
                 grpc_options=self.grpcOptions,
             )
             await exporter.__aenter__()
+            entered = True
             await exporter.serve(task_group)
         finally:
             # Shield all cleanup operations from abrupt cancellation for clean shutdown
-            if exporter:
+            if exporter and entered:
                 with CancelScope(shield=True):
                     await exporter.__aexit__(None, None, None)
🧹 Nitpick comments (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)

51-79: Consider handling SIGCHLD to promptly reap zombie processes

If the child process exits unexpectedly (e.g., crashes) without the parent receiving a signal, the child becomes a zombie until the parent receives its next signal. While the current implementation works correctly for signal-driven termination, adding a SIGCHLD handler would improve robustness by promptly reaping children.

Add SIGCHLD handling to immediately detect child termination:

 def _handle_parent(pid):
     """Handle parent process waiting for child and signal forwarding."""
+    child_exited = False
+
+    def sigchld_handler(signum, _):
+        """Handle SIGCHLD to detect child exit."""
+        nonlocal child_exited
+        child_exited = True
+
     def parent_signal_handler(signum, _):
         logger.info("PARENT: Received %d (%s), forwarding to child PID %d", signum, signal.Signals(signum).name, pid)
         if pid and pid > 0:
             try:
                 os.kill(pid, signum)
             except ProcessLookupError:
                 pass

     # Set up signal handlers after fork
     for sig in (signal.SIGINT, signal.SIGTERM, signal.SIGHUP, signal.SIGQUIT):
         signal.signal(sig, parent_signal_handler)
+    signal.signal(signal.SIGCHLD, sigchld_handler)

+    # Use signal.pause() to wait for signals instead of blocking in waitpid
+    while not child_exited:
+        signal.pause()
+
     _, status = os.waitpid(pid, 0)

Alternatively, the current implementation is sufficient if prompt zombie reaping isn't critical for your use case.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 115afe4 and 7aaacc0.

📒 Files selected for processing (4)
  • packages/jumpstarter-cli/jumpstarter_cli/run.py (1 hunks)
  • packages/jumpstarter/jumpstarter/config/exporter.py (2 hunks)
  • packages/jumpstarter/jumpstarter/exporter/exporter.py (4 hunks)
  • packages/jumpstarter/jumpstarter/exporter/session.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/jumpstarter/jumpstarter/exporter/session.py
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
  • packages/jumpstarter/jumpstarter/exporter/exporter.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO 4.6.2+, `open_signal_receiver()` is a regular synchronous context manager that should be used with a plain `with` statement, not `async with`. However, it returns an asynchronous iterator that must be consumed with `async for`. The correct usage pattern is: `with open_signal_receiver(...) as signals: async for signum in signals:`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO, `open_signal_receiver()` returns a synchronous context manager (used with regular `with`), not an asynchronous one. While the context manager is synchronous, it yields an asynchronous iterator that should be consumed with `async for`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-08-14T13:20:51.953Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:15-15
Timestamp: 2025-08-14T13:20:51.953Z
Learning: Signal 0 is not a valid receivable signal in Unix systems. It can only be used as an argument to kill(pid, 0) to test if a process exists, but doesn't actually send a signal. Receivable signals are positive integers (e.g., SIGINT=2, SIGTERM=15, SIGHUP=1, SIGQUIT=3). Using 0 to represent "no signal received" in signal handling code is correct.

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code Graph Analysis (2)
packages/jumpstarter-cli/jumpstarter_cli/run.py (5)
packages/jumpstarter-cli-common/jumpstarter_cli_common/config.py (1)
  • opt_config (99-100)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (1)
  • handle_exceptions (32-46)
packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py (1)
  • signal_handler (9-22)
packages/jumpstarter/jumpstarter/config/exporter.py (1)
  • serve (159-189)
packages/jumpstarter/jumpstarter/exporter/exporter.py (2)
  • serve (120-175)
  • status (138-157)
packages/jumpstarter/jumpstarter/exporter/exporter.py (3)
packages/jumpstarter-protocol/jumpstarter_protocol/jumpstarter/v1/jumpstarter_pb2_grpc.py (1)
  • ControllerServiceStub (9-68)
packages/jumpstarter/jumpstarter/config/exporter.py (2)
  • channel_factory (165-173)
  • serve (159-189)
packages/jumpstarter/conftest.py (1)
  • Unregister (59-60)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: e2e
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
🔇 Additional comments (6)
packages/jumpstarter/jumpstarter/exporter/exporter.py (3)

33-53: Cleanup logic looks good with proper timeout and error handling

The new cleanup implementation is robust:

  • 10-second timeout ensures cleanup doesn't hang indefinitely
  • Inner exception handling prevents unregistration errors from propagating
  • Outer exception handling ensures cleanup issues don't mask original errors
  • Clear logging for both success and failure cases

This aligns well with the fork-based architecture where child termination triggers proper cleanup.


120-131: serve() signature change properly delegates lifecycle management

The new signature accepting an external task_group correctly shifts responsibility for task lifecycle management to the caller. The docstring clearly documents this contract including:

  • External responsibility for the task group lifecycle
  • Expected behavior on lease changes
  • Caller's responsibility for cleanup/restart logic

This design works well with the fork-based supervisor in the CLI that handles restart on exit.


159-172: External task group integration maintains proper separation of concerns

The migration from internal task group to external task group is well executed:

  • Status task is correctly added to the external group via start_soon
  • Lease change breaks out of the loop instead of cancelling internally
  • The comment clearly indicates the caller handles cancellation

Given the learnings about the fork-based architecture where the child process is restarted on lease changes, this approach eliminates any concerns about orphaned tasks.

packages/jumpstarter-cli/jumpstarter_cli/run.py (3)

15-49: Child process signal handling implementation is well designed

The child process handler correctly:

  • Uses anyio's signal handling with proper synchronous context manager (with) and async iteration
  • Tracks received signals and prevents duplicate processing
  • Logs signals with descriptive names for debugging
  • Returns the signal number for parent to interpret
  • Properly exits the child process with sys.exit(anyio.run(...))

The use of 0 to represent "no signal received" is correct as noted in the learnings.


51-79: Parent process correctly implements signal forwarding and child restart logic

The parent handler properly:

  • Forwards signals to the child with error handling for ProcessLookupError
  • Interprets exit code 0 or SIGHUP as restart triggers
  • Returns appropriate Unix exit codes (128 + signal number)
  • Handles both normal exits and signal terminations

The implementation correctly addresses the PR objectives by ensuring the parent waits for the child and propagates exit codes while treating SIGHUP specially for restarts.


81-91: Fork-based supervisor with restart loop is well implemented

The supervisor correctly:

  • Implements a restart loop that continues when _handle_parent returns None
  • Exits with the child's exit code when non-None is returned
  • Includes defensive sys.exit(1) after _handle_child to ensure the child never continues

This addresses the core PR objectives of proper signal handling and process lifecycle management.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
packages/jumpstarter/jumpstarter/exporter/exporter.py (1)

32-51: Bounded unregister and non-masking cleanup are on point

  • move_on_after(10) to bound the unregister is a solid defensive choice.
  • Logging and not re-raising from __aexit__ avoids masking original errors during teardown.

Minor optional refinements:

  • Use logger.exception(...) for automatic stack traces.
  • Close the channel after unregister if channel_factory() creates a new channel per call.

Apply this small refinement for the logging and optional channel close:

-        try:
-            if self.registered:
-                logger.info("Unregistering exporter with controller")
-                try:
-                    with anyio.move_on_after(10):  # 10 second timeout
-                        controller = jumpstarter_pb2_grpc.ControllerServiceStub(await self.channel_factory())
-                        await controller.Unregister(
+        try:
+            if self.registered:
+                logger.info("Unregistering exporter with controller")
+                try:
+                    with anyio.move_on_after(10):  # 10 second timeout
+                        channel = await self.channel_factory()
+                        controller = jumpstarter_pb2_grpc.ControllerServiceStub(channel)
+                        await controller.Unregister(
                             jumpstarter_pb2.UnregisterRequest(
                                 reason="Exporter shutdown",
                             )
                         )
                         logger.info("Controller unregistration completed successfully")
+                finally:
+                    try:
+                        # Close channel if independently created
+                        await channel.close()  # no-op if channel is reused or already closed
+                    except Exception:
+                        logger.debug("Controller channel close failed or unnecessary", exc_info=True)
                 except Exception as e:
-                    logger.error("Error during controller unregistration: %s", e, exc_info=True)
+                    logger.exception("Error during controller unregistration: %s", e)
 
         except Exception as e:
-            logger.error("Error during exporter cleanup: %s", e, exc_info=True)
+            logger.exception("Error during exporter cleanup: %s", e)
packages/jumpstarter-cli/jumpstarter_cli/run.py (2)

76-93: Parent forwarding and exit-code mapping look correct

  • Signal forwarding is simple and non-blocking.
  • waitpid() handling and mapping: 0/SIGHUP → restart, signals → 128+N; unexpected terminations are handled.

Optional:

  • Prefer logger.error over click.echo for consistency.

Apply this tiny logging consistency tweak:

-        click.echo(f"Child killed by unhandled signal: {child_exit_signal}", err=True)
+        logger.error("Child killed by unhandled signal: %s", child_exit_signal)

100-103: Optional: consider process-group delivery for broader shutdowns

If the child ever spawns subprocesses, forwarding to the child’s process group (after setsid() in the child) ensures all descendants receive the signal. Not necessary now, but worth noting.

Possible approach (conceptual, not a direct diff here):

  • In the child (immediately after forking), call os.setsid() to create a new session/process group.
  • In the parent, forward using os.killpg(child_pgid, signum).

This is only needed if the child will spawn subprocesses that must also be signaled.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7aaacc0 and e6b0b78.

📒 Files selected for processing (4)
  • packages/jumpstarter-cli/jumpstarter_cli/run.py (1 hunks)
  • packages/jumpstarter/jumpstarter/config/exporter.py (2 hunks)
  • packages/jumpstarter/jumpstarter/exporter/exporter.py (2 hunks)
  • packages/jumpstarter/jumpstarter/exporter/session.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/jumpstarter/jumpstarter/exporter/session.py
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.

Applied to files:

  • packages/jumpstarter/jumpstarter/exporter/exporter.py
  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO 4.6.2+, `open_signal_receiver()` is a regular synchronous context manager that should be used with a plain `with` statement, not `async with`. However, it returns an asynchronous iterator that must be consumed with `async for`. The correct usage pattern is: `with open_signal_receiver(...) as signals: async for signum in signals:`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO, `open_signal_receiver()` returns a synchronous context manager (used with regular `with`), not an asynchronous one. While the context manager is synchronous, it yields an asynchronous iterator that should be consumed with `async for`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-08-14T13:20:51.953Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:15-15
Timestamp: 2025-08-14T13:20:51.953Z
Learning: Signal 0 is not a valid receivable signal in Unix systems. It can only be used as an argument to kill(pid, 0) to test if a process exists, but doesn't actually send a signal. Receivable signals are positive integers (e.g., SIGINT=2, SIGTERM=15, SIGHUP=1, SIGQUIT=3). Using 0 to represent "no signal received" in signal handling code is correct.

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code Graph Analysis (2)
packages/jumpstarter-cli/jumpstarter_cli/run.py (5)
packages/jumpstarter-cli-common/jumpstarter_cli_common/config.py (1)
  • opt_config (99-100)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (2)
  • handle_exceptions (32-46)
  • leaf_exceptions (78-102)
packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py (1)
  • signal_handler (9-22)
packages/jumpstarter/jumpstarter/exporter/exporter.py (2)
  • serve (119-164)
  • status (129-148)
packages/jumpstarter/jumpstarter/config/exporter.py (1)
  • serve (159-188)
packages/jumpstarter/jumpstarter/config/exporter.py (1)
packages/jumpstarter/jumpstarter/exporter/exporter.py (2)
  • Exporter (24-164)
  • serve (119-164)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: e2e
🔇 Additional comments (7)
packages/jumpstarter/jumpstarter/config/exporter.py (3)

161-161: Good call: import CancelScope locally to avoid circular imports

Importing CancelScope within the method keeps dependencies localized and prevents circular-import issues.


175-188: Shielded cleanup on shutdown looks correct and robust

The try/finally with CancelScope(shield=True) ensures __aexit__ runs even under cancellation, aligning with the PR’s graceful shutdown goals.


175-188: Exporter’s __aenter__ Is the Default No-Op—Safe to Call __aexit__ Directly

I’ve confirmed that Exporter does not define its own __aenter__ (it inherits the default from contextlib.AbstractAsyncContextManager, which simply returns self with no side effects). Its custom __aexit__ is already guarded by the registered flag, so invoking await exporter.__aexit__(…) without a prior __aenter__ is safe here. No changes needed.

packages/jumpstarter/jumpstarter/exporter/exporter.py (1)

120-123: Docstring addition is fine

The short docstring for serve() is harmless and keeps intent clear.

packages/jumpstarter-cli/jumpstarter_cli/run.py (3)

24-31: Correct AnyIO signal receiver usage

Using with open_signal_receiver(...) plus async for matches AnyIO’s intended pattern and avoids asyncio handler conflicts.


60-64: Return-code semantics are consistent with parent handler

Returning the signal number (or 0) is correctly interpreted by the parent: 0 and SIGHUP → restart; other signals → 128+N.


96-106: Defensive exit in child path is good

The extra sys.exit(1) after _handle_child(exporter) guards against unexpected fall-through.

Comment on lines +38 to +58
async def run_exporter():
nonlocal exporter_tg
try:
async with create_task_group() as tg:
exporter_tg = tg
await exporter.serve()
except* Exception as excgroup:
from jumpstarter_cli_common.exceptions import leaf_exceptions
for exc in leaf_exceptions(excgroup):
click.echo(
f"Exception while serving on the exporter: {type(exc).__name__}: {exc}",
err=True,
)

async def signal_handler_wrapper():
await signal_handler(lambda: exporter_tg.cancel_scope.cancel() if exporter_tg else None)

signal_tg.start_soon(signal_handler_wrapper)
await run_exporter()
# Cancel the signal handler after exporter completes
signal_tg.cancel_scope.cancel()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Child cancellation path likely raises and skips exit-code mapping

On signal, cancel_func() cancels the exporter’s task group while await exporter.serve() is awaited. Without catching AnyIO’s cancellation exception, the CancelledError (BaseException) propagates out of serve_with_graceful_shutdown, causing anyio.run(...) to raise and bypass your return logic (exit code mapping).

Catch get_cancelled_exc_class() so the child exits cleanly with the intended status.

Apply this diff:

-from anyio import create_task_group, open_signal_receiver
+from anyio import create_task_group, open_signal_receiver, get_cancelled_exc_class
@@
         async def run_exporter():
             nonlocal exporter_tg
             try:
                 async with create_task_group() as tg:
                     exporter_tg = tg
                     await exporter.serve()
-            except* Exception as excgroup:
+            except get_cancelled_exc_class():
+                # Graceful cancellation via signal; let the caller decide exit code
+                return
+            except* Exception as excgroup:
                 from jumpstarter_cli_common.exceptions import leaf_exceptions
                 for exc in leaf_exceptions(excgroup):
                     click.echo(
                         f"Exception while serving on the exporter: {type(exc).__name__}: {exc}",
                         err=True,
                     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def run_exporter():
nonlocal exporter_tg
try:
async with create_task_group() as tg:
exporter_tg = tg
await exporter.serve()
except* Exception as excgroup:
from jumpstarter_cli_common.exceptions import leaf_exceptions
for exc in leaf_exceptions(excgroup):
click.echo(
f"Exception while serving on the exporter: {type(exc).__name__}: {exc}",
err=True,
)
async def signal_handler_wrapper():
await signal_handler(lambda: exporter_tg.cancel_scope.cancel() if exporter_tg else None)
signal_tg.start_soon(signal_handler_wrapper)
await run_exporter()
# Cancel the signal handler after exporter completes
signal_tg.cancel_scope.cancel()
++ b/packages/jumpstarter-cli/jumpstarter_cli/run.py
@@
from anyio import create_task_group, open_signal_receiver, get_cancelled_exc_class
@@
async def run_exporter():
nonlocal exporter_tg
try:
async with create_task_group() as tg:
exporter_tg = tg
await exporter.serve()
except get_cancelled_exc_class():
# Graceful cancellation via signal; let the caller decide exit code
return
except* Exception as excgroup:
from jumpstarter_cli_common.exceptions import leaf_exceptions
for exc in leaf_exceptions(excgroup):
click.echo(
f"Exception while serving on the exporter: {type(exc).__name__}: {exc}",
err=True,
)
async def signal_handler_wrapper():
await signal_handler(lambda: exporter_tg.cancel_scope.cancel() if exporter_tg else None)
signal_tg.start_soon(signal_handler_wrapper)
await run_exporter()
# Cancel the signal handler after exporter completes
signal_tg.cancel_scope.cancel()

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
packages/jumpstarter/jumpstarter/config/exporter.py (1)

175-183: Verify lifecycle symmetry: is aexit safe without a preceding aenter?

The code no longer calls __aenter__(), yet still calls __aexit__(). If __aexit__ assumes state set up by __aenter__, this can raise and either fail cleanup or mask real errors. If serve() is now the sole lifecycle and __aexit__ is explicitly safe standalone, that's fine—please confirm.

As a safe pattern, consider re-introducing an explicit enter flag around __aenter__/__aexit__ symmetry:

Apply this diff (if Exporter.__aenter__ is still required/meaningful):

-        exporter = None
-        try:
-            exporter = Exporter(
+        exporter = None
+        entered = False
+        try:
+            exporter = Exporter(
                 channel_factory=channel_factory,
                 device_factory=ExporterConfigV1Alpha1DriverInstance(children=self.export).instantiate,
                 tls=self.tls,
                 grpc_options=self.grpcOptions,
             )
-            await exporter.serve()
+            # Ensure enter/exit symmetry if __aenter__ establishes required state
+            if hasattr(exporter, "__aenter__"):
+                await exporter.__aenter__()
+                entered = True
+            await exporter.serve()
         finally:
-            # Shield all cleanup operations from abrupt cancellation for clean shutdown
-            if exporter:
+            # Shield all cleanup operations from abrupt cancellation for clean shutdown
+            if exporter and (not hasattr(exporter, "__aenter__") or entered):
                 with CancelScope(shield=True):
-                    await exporter.__aexit__(None, None, None)
+                    try:
+                        await exporter.__aexit__(None, None, None)
+                    except Exception:
+                        pass

Run this script to check whether Exporter defines __aenter__ and where it’s used:

#!/bin/bash
# Inspect Exporter lifecycle methods and usages
rg -nP -C3 '^\s*class\s+Exporter\b'
rg -nP -C2 'def\s+__aenter__\s*\('
rg -nP -C2 'await\s+exporter\.__aenter__\s*\('
rg -nP -C2 'async\s+with\s+Exporter\('
🧹 Nitpick comments (2)
packages/jumpstarter/jumpstarter/exporter/session.py (1)

50-55: Avoid secondary exceptions while preparing error context; derive driver name before close

Computing report = self.root_device.report() inside the except block can raise again and mask the original close error. Derive a descriptive name before attempting close(), and coerce labels to a plain dict to avoid protobuf MapContainer quirks.

Apply this diff within this method to compute the name up-front and make label access robust:

 def __exit__(self, exc_type, exc_value, traceback):
-        try:
-            self.root_device.close()
-        except Exception as e:
-            # Get driver name from report for more descriptive logging
-            report = self.root_device.report()
-            driver_name = report.labels.get('jumpstarter.dev/name', self.root_device.__class__.__name__)
-            logger.error("Error closing driver %s: %s", driver_name, e, exc_info=True)
-        finally:
-            logging.getLogger().removeHandler(self._logging_handler)
+        # Best-effort: get a descriptive driver name before closing; never let this raise
+        driver_name = self.root_device.__class__.__name__
+        try:
+            report = self.root_device.report()
+            driver_name = dict(getattr(report, "labels", {})).get("jumpstarter.dev/name", driver_name)
+        except Exception:
+            pass
+
+        try:
+            self.root_device.close()
+        except Exception as e:
+            logger.error("Error closing driver %s: %s", driver_name, e, exc_info=True)
+        finally:
+            logging.getLogger().removeHandler(self._logging_handler)
packages/jumpstarter/jumpstarter/config/exporter.py (1)

185-188: Don’t let cleanup mask the primary failure; swallow/log aexit errors

If exporter.__aexit__() throws, it will override any exception from exporter.serve(). Wrap it in try/except to preserve the original failure signal.

Apply this diff:

-            if exporter:
-                with CancelScope(shield=True):
-                    await exporter.__aexit__(None, None, None)
+            if exporter:
+                with CancelScope(shield=True):
+                    try:
+                        await exporter.__aexit__(None, None, None)
+                    except Exception:
+                        # Don't mask the original exception from serve()
+                        pass
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e6b0b78 and b502a28.

📒 Files selected for processing (4)
  • packages/jumpstarter-cli/jumpstarter_cli/run.py (1 hunks)
  • packages/jumpstarter/jumpstarter/config/exporter.py (2 hunks)
  • packages/jumpstarter/jumpstarter/exporter/exporter.py (2 hunks)
  • packages/jumpstarter/jumpstarter/exporter/session.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/jumpstarter/jumpstarter/exporter/exporter.py
  • packages/jumpstarter-cli/jumpstarter_cli/run.py
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
🧬 Code Graph Analysis (2)
packages/jumpstarter/jumpstarter/exporter/session.py (3)
packages/jumpstarter/jumpstarter/driver/base.py (2)
  • close (85-87)
  • report (192-209)
packages/jumpstarter-driver-composite/jumpstarter_driver_composite/client.py (1)
  • close (13-16)
packages/jumpstarter-driver-composite/jumpstarter_driver_composite/driver.py (1)
  • report (37-38)
packages/jumpstarter/jumpstarter/config/exporter.py (1)
packages/jumpstarter/jumpstarter/exporter/exporter.py (2)
  • Exporter (24-164)
  • serve (119-164)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: e2e
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
🔇 Additional comments (2)
packages/jumpstarter/jumpstarter/exporter/session.py (1)

47-49: Good: make device close best-effort without breaking teardown

Wrapping root_device.close() in a try/except prevents teardown from cascading failures. The finally ensures the logging handler is always removed.

packages/jumpstarter/jumpstarter/config/exporter.py (1)

161-162: Local CancelScope import is appropriate

Importing CancelScope within serve() avoids unnecessary module-level deps and potential circulars. Looks good.

if self.lease_name != "" and self.lease_name != status.lease_name:
self.lease_name = status.lease_name
logger.info("Lease status changed, killing existing connections")
tg.cancel_scope.cancel()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily wrong, but asking, why do we stop canceling this scope here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably cancelled when you exit the with...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not to trigger "abrupt" cancellation, letting it exit the await (https://github.com/jumpstarter-dev/jumpstarter/pull/589/files#diff-a0c03b6fd1ccd1545c4f26d1e232edb53e3b775a456a96fd78151689fcc8dc5fR43, then in turn line 57) cleanly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bah...but then the other tasks exporter's tg (status, handle) doesn't really finish, it does indeed need to be cancelled.
the voluntary/cancelled exit is then handled in run.py line 44-50 - either logging an error or be silent on get_cancelled_exc_class

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've put it back

finally:
# Shield all cleanup operations from abrupt cancellation for clean shutdown
if exporter:
with CancelScope(shield=True):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mangelajo
Copy link
Copy Markdown
Member

ok, it looks good but it’s failing consistently the E2E tests for some reason: https://github.com/jumpstarter-dev/jumpstarter/actions?query=branch%3Asignals+workflow%3A%22%22Run+E2E+Tests%22%22++

https://github.com/jumpstarter-dev/jumpstarter/actions/runs/17001120921/job/48202970902#step:2:903

it fails there: https://github.com/jumpstarter-dev/jumpstarter-e2e/blob/main/tests.bats#L125

Something happens with the exporter being taken for a lease.

I'll try to reproduce locally.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
packages/jumpstarter/jumpstarter/exporter/exporter.py (4)

32-51: Good, guarded cleanup; consider closing the channel and logging on timeout

The unregister path is solid and shielded by a 10s timeout. Two small robustness tweaks:

  • Close the channel you create to avoid leaking connections.
  • Detect/log when the move_on_after timeout actually fired (so ops can tell unregistration timed out).

Apply:

-                    with anyio.move_on_after(10):  # 10 second timeout
-                        controller = jumpstarter_pb2_grpc.ControllerServiceStub(await self.channel_factory())
-                        await controller.Unregister(
+                    with anyio.move_on_after(10) as scope:  # 10 second timeout
+                        channel = await self.channel_factory()
+                        try:
+                            controller = jumpstarter_pb2_grpc.ControllerServiceStub(channel)
+                            await controller.Unregister(
                             jumpstarter_pb2.UnregisterRequest(
                                 reason="Exporter shutdown",
                             )
-                        )
+                            )
+                        finally:
+                            try:
+                                channel.close()
+                            except Exception:
+                                logger.debug("Ignoring error closing channel during cleanup", exc_info=True)
+                    if scope.cancelled_caught:
+                        logger.warning("Controller unregistration timed out after 10s")

61-83: Register channel lifetime is unmanaged; close it to avoid leaks

session() creates a new gRPC channel for controller that is never closed. Over many (re)registrations this can leak connections.

Wrap the channel and close it:

-        controller = jumpstarter_pb2_grpc.ControllerServiceStub(await self.channel_factory())
+        channel = await self.channel_factory()
+        controller = jumpstarter_pb2_grpc.ControllerServiceStub(channel)
         with Session(
             uuid=self.uuid,
             labels=self.labels,
             root_device=self.device_factory(),
         ) as session:
             async with session.serve_unix_async() as path:
                 async with grpc.aio.secure_channel(
                     f"unix://{path}", grpc.local_channel_credentials(grpc.LocalConnectionType.UDS)
                 ) as channel:
                     response = await jumpstarter_pb2_grpc.ExporterServiceStub(channel).GetReport(empty_pb2.Empty())
                     logger.info("Registering exporter with controller")
                     await controller.Register(
                         jumpstarter_pb2.RegisterRequest(
                             labels=self.labels,
                             reports=response.reports,
                         )
                     )
                     self.registered = True
-                yield path
+                try:
+                    yield path
+                finally:
+                    try:
+                        channel.close()
+                    except Exception:
+                        logger.debug("Ignoring error closing controller channel", exc_info=True)

120-122: Docstring: clarify when serve() returns

Helpful to note the exit conditions to aid operators/debugging.

-        """
-        Serve the exporter.
-        """
+        """
+        Serve the exporter until cancelled or the lease name changes.
+        Exits when:
+        - Status stream indicates the lease name changed (triggers controlled shutdown).
+        - The task group is cancelled (e.g., SIGTERM/SIGINT propagated by the supervisor).
+        """

162-165: Reduce log verbosity for steady-state status

Emitting “Currently leased/not leased” on every status can spam logs in busy systems.

Consider downgrading to debug or rate-limiting (e.g., only log on state transitions).

packages/jumpstarter/jumpstarter/config/exporter.py (2)

169-174: Guard against None grpcOptions

YAML configs can set grpcOptions: null. Passing None into the channel helper can backfire. Defaulting to {} is safer.

-            return aio_secure_channel(self.endpoint, credentials, self.grpcOptions)
+            return aio_secure_channel(self.endpoint, credentials, self.grpcOptions or {})

175-188: Shielded cleanup is good; consider symmetric aenter/aexit and None-safe options

Two tweaks:

  • Call __aenter__() so future changes to Exporter’s enter logic don’t get bypassed.
  • Pass {} for grpc_options to keep None out of the exporter.
-        exporter = None
+        exporter = None
+        entered = False
         try:
             exporter = Exporter(
                 channel_factory=channel_factory,
                 device_factory=ExporterConfigV1Alpha1DriverInstance(children=self.export).instantiate,
                 tls=self.tls,
-                grpc_options=self.grpcOptions,
+                grpc_options=self.grpcOptions or {},
             )
-            await exporter.serve()
+            await exporter.__aenter__()
+            entered = True
+            await exporter.serve()
         finally:
             # Shield all cleanup operations from abrupt cancellation for clean shutdown
-            if exporter:
+            if exporter and entered:
                 with CancelScope(shield=True):
                     await exporter.__aexit__(None, None, None)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b502a28 and c667913.

📒 Files selected for processing (4)
  • packages/jumpstarter-cli/jumpstarter_cli/run.py (1 hunks)
  • packages/jumpstarter/jumpstarter/config/exporter.py (2 hunks)
  • packages/jumpstarter/jumpstarter/exporter/exporter.py (2 hunks)
  • packages/jumpstarter/jumpstarter/exporter/session.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/jumpstarter/jumpstarter/exporter/session.py
  • packages/jumpstarter-cli/jumpstarter_cli/run.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#584
File: packages/jumpstarter/jumpstarter/client/grpc.py:38-54
Timestamp: 2025-08-18T07:13:37.602Z
Learning: User michalskrivanek prefers inline code over small helper function extractions when the abstraction doesn't significantly improve readability or reduce complexity. They value straightforward, readable code over reducing minor duplication.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.

Applied to files:

  • packages/jumpstarter/jumpstarter/exporter/exporter.py
🧬 Code Graph Analysis (2)
packages/jumpstarter/jumpstarter/config/exporter.py (1)
packages/jumpstarter/jumpstarter/exporter/exporter.py (2)
  • Exporter (24-165)
  • serve (119-165)
packages/jumpstarter/jumpstarter/exporter/exporter.py (3)
packages/jumpstarter-protocol/jumpstarter_protocol/jumpstarter/v1/jumpstarter_pb2_grpc.py (1)
  • ControllerServiceStub (9-68)
packages/jumpstarter/jumpstarter/config/exporter.py (1)
  • channel_factory (165-173)
packages/jumpstarter/conftest.py (1)
  • Unregister (59-60)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: e2e
🔇 Additional comments (3)
packages/jumpstarter/jumpstarter/exporter/exporter.py (1)

150-157: Correct cancellation on lease change

Cancelling the task group and breaking ensures all spawned tasks (status/listen/handle and session) wind down; with the fork-based supervisor this cleanly returns control to the parent for a restart. Looks good.

packages/jumpstarter/jumpstarter/config/exporter.py (2)

161-162: Nice: local CancelScope import avoids cycles

Local import keeps the module graph clean and avoids circular import headaches during CLI startup/shutdown.


175-188: No configs set grpcOptions: null
I searched all YAML and JSON configuration files for grpcOptions: null or grpcOptions: ~ and found no occurrences. It’s safe to pass self.grpcOptions directly without additional null checks.

@michalskrivanek
Copy link
Copy Markdown
Contributor Author

it's a great test. i focused to much on signals testing i missed the most obvious obvious one when no signals are involved, just release the lease

@mangelajo
Copy link
Copy Markdown
Member

mangelajo commented Aug 18, 2025

Nice, so that cancel_scope was important, In theory ChatGPT assured that the with would cancel it, but it doesn't?...

The Unit tests failed, but I think it was a transient qemu issue, re-running them

@mangelajo
Copy link
Copy Markdown
Member

Github is not re-running all the unit tests, but it needed a rebase anyway.

@mangelajo mangelajo enabled auto-merge August 18, 2025 14:16
auto-merge was automatically disabled August 20, 2025 14:01

Head branch was pushed to by a user without write access

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
packages/jumpstarter/jumpstarter/config/exporter.py (1)

176-191: Guard aexit behind a successful aenter to avoid unsafe cleanup

If await exporter.__aenter__() raises, calling __aexit__ is unsafe and can mask the original error. Track whether the context was successfully entered.

Apply this diff:

         exporter = None
+        entered = False
         try:
             exporter = Exporter(
                 channel_factory=channel_factory,
                 device_factory=ExporterConfigV1Alpha1DriverInstance(children=self.export).instantiate,
                 tls=self.tls,
                 grpc_options=self.grpcOptions,
             )
             # Initialize the exporter (registration, etc.)
-            await exporter.__aenter__()
+            await exporter.__aenter__()
+            entered = True
             yield exporter
         finally:
             # Shield all cleanup operations from abrupt cancellation for clean shutdown
-            if exporter:
+            if exporter and entered:
                 with CancelScope(shield=True):
                     await exporter.__aexit__(None, None, None)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)

43-57: Catch AnyIO cancellation to ensure clean exit code mapping

When exporter.stop() cancels its task group, await exporter.serve() will raise AnyIO’s cancellation exception. Without catching it, anyio.run() raises and bypasses your exit-code mapping. Catch the cancellation exception and continue to the return mapping.

Apply this diff:

-from anyio import create_task_group, open_signal_receiver
+from anyio import create_task_group, open_signal_receiver, get_cancelled_exc_class
@@
             async with config.create_exporter() as exporter:
                 try:
                     await exporter.serve()
+                except get_cancelled_exc_class():
+                    # Graceful cancellation (e.g., due to signals) – proceed to map exit/restart
+                    pass
                 except* Exception as excgroup:
                     from jumpstarter_cli_common.exceptions import leaf_exceptions
                     for exc in leaf_exceptions(excgroup):
                         if not isinstance(exc, anyio.get_cancelled_exc_class()):
                             click.echo(
                                 f"Exception while serving on the exporter: {type(exc).__name__}: {exc}",
                                 err=True,
                             )
🧹 Nitpick comments (2)
packages/jumpstarter/jumpstarter/config/exporter.py (1)

167-175: Channel factory typing vs. usage mismatch

channel_factory() is awaited by callers (e.g., await self.channel_factory() in exporter code), so its type is effectively Callable[[], Awaitable[grpc.aio.Channel]]. Consider aligning the annotation in the Exporter class to avoid confusion in type checkers and IDEs.

If agreeable, I can provide the corresponding change in Exporter to:

  • import Awaitable
  • annotate channel_factory: Callable[[], Awaitable[grpc.aio.Channel]]
packages/jumpstarter/jumpstarter/exporter/exporter.py (1)

32-49: Graceful stop API looks good; minor type annotation improvement

The stop semantics with immediate vs. wait-for-lease-exit are sensible and match the forked-supervisor model. One nit: channel_factory is awaited elsewhere, so its annotation should be Callable[[], Awaitable[grpc.aio.Channel]] to match usage.

I can send a small patch to import Awaitable and adjust the annotation if you confirm.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c667913 and b923cbe.

📒 Files selected for processing (4)
  • packages/jumpstarter-cli/jumpstarter_cli/run.py (1 hunks)
  • packages/jumpstarter/jumpstarter/config/exporter.py (1 hunks)
  • packages/jumpstarter/jumpstarter/exporter/exporter.py (4 hunks)
  • packages/jumpstarter/jumpstarter/exporter/session.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/jumpstarter/jumpstarter/exporter/session.py
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#584
File: packages/jumpstarter/jumpstarter/client/grpc.py:38-54
Timestamp: 2025-08-18T07:13:37.602Z
Learning: User michalskrivanek prefers inline code over small helper function extractions when the abstraction doesn't significantly improve readability or reduce complexity. They value straightforward, readable code over reducing minor duplication.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.

Applied to files:

  • packages/jumpstarter/jumpstarter/exporter/exporter.py
  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO 4.6.2+, `open_signal_receiver()` is a regular synchronous context manager that should be used with a plain `with` statement, not `async with`. However, it returns an asynchronous iterator that must be consumed with `async for`. The correct usage pattern is: `with open_signal_receiver(...) as signals: async for signum in signals:`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO, `open_signal_receiver()` returns a synchronous context manager (used with regular `with`), not an asynchronous one. While the context manager is synchronous, it yields an asynchronous iterator that should be consumed with `async for`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-08-14T13:20:51.953Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:15-15
Timestamp: 2025-08-14T13:20:51.953Z
Learning: Signal 0 is not a valid receivable signal in Unix systems. It can only be used as an argument to kill(pid, 0) to test if a process exists, but doesn't actually send a signal. Receivable signals are positive integers (e.g., SIGINT=2, SIGTERM=15, SIGHUP=1, SIGQUIT=3). Using 0 to represent "no signal received" in signal handling code is correct.

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code Graph Analysis (3)
packages/jumpstarter/jumpstarter/config/exporter.py (3)
packages/jumpstarter/jumpstarter/exporter/exporter.py (2)
  • Exporter (25-189)
  • serve (139-189)
packages/jumpstarter/jumpstarter/common/grpc.py (2)
  • ssl_channel_credentials (16-46)
  • aio_secure_channel (49-54)
packages/jumpstarter/jumpstarter/config/grpc.py (1)
  • call_credentials (6-20)
packages/jumpstarter/jumpstarter/exporter/exporter.py (3)
packages/jumpstarter-protocol/jumpstarter_protocol/jumpstarter/v1/jumpstarter_pb2_grpc.py (1)
  • ControllerServiceStub (9-68)
packages/jumpstarter/jumpstarter/config/exporter.py (1)
  • channel_factory (167-174)
packages/jumpstarter/conftest.py (1)
  • Unregister (59-60)
packages/jumpstarter-cli/jumpstarter_cli/run.py (4)
packages/jumpstarter-cli-common/jumpstarter_cli_common/config.py (1)
  • opt_config (99-100)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (2)
  • handle_exceptions (32-46)
  • leaf_exceptions (78-102)
packages/jumpstarter/jumpstarter/exporter/exporter.py (3)
  • stop (36-49)
  • serve (139-189)
  • status (148-167)
packages/jumpstarter/jumpstarter/config/exporter.py (2)
  • create_exporter (160-191)
  • serve (193-195)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: e2e
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
🔇 Additional comments (3)
packages/jumpstarter/jumpstarter/config/exporter.py (1)

193-196: LGTM: serve() now centralizes lifecycle via create_exporter

Using async with self.create_exporter() keeps lifecycle concerns localized and ensures teardown shielding is applied uniformly.

packages/jumpstarter/jumpstarter/exporter/exporter.py (1)

169-189: Serve loop and shutdown sequencing are consistent with the forked model

Tracking self._tg, using stop() to cancel the task group, and breaking out on lease changes or stop requests ensures the child process exits cleanly so the parent can restart it. This aligns with the PR’s supervision goals.

packages/jumpstarter-cli/jumpstarter_cli/run.py (1)

65-78: LGTM: Signal forwarding is minimal and avoids double-wait races

Forwarding SIGINT/SIGTERM/SIGHUP/SIGQUIT to the child without waitpid() inside the handler prevents reaping the child twice and avoids handler-side blocking.

Comment thread packages/jumpstarter-cli/jumpstarter_cli/run.py
Comment thread packages/jumpstarter-cli/jumpstarter_cli/run.py
Comment thread packages/jumpstarter/jumpstarter/exporter/exporter.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
packages/jumpstarter/jumpstarter/exporter/exporter.py (3)

86-107: Controller channel leak in session(): close the channel after Register

session() opens a controller channel but never closes it. Over time (restarts, errors) this can leak FDs/memory. Close it in a finally.

Apply:

-        controller = jumpstarter_pb2_grpc.ControllerServiceStub(await self.channel_factory())
+        channel = await self.channel_factory()
+        controller = jumpstarter_pb2_grpc.ControllerServiceStub(channel)
         with Session(
             uuid=self.uuid,
             labels=self.labels,
             root_device=self.device_factory(),
         ) as session:
             async with session.serve_unix_async() as path:
                 async with grpc.aio.secure_channel(
                     f"unix://{path}", grpc.local_channel_credentials(grpc.LocalConnectionType.UDS)
                 ) as channel:
                     response = await jumpstarter_pb2_grpc.ExporterServiceStub(channel).GetReport(empty_pb2.Empty())
                     logger.info("Registering exporter with controller")
                     await controller.Register(
                         jumpstarter_pb2.RegisterRequest(
                             labels=self.labels,
                             reports=response.reports,
                         )
                     )
                     self.registered = True
                 yield path
+        await channel.close()

If you prefer to be extra safe in the presence of exceptions:

-        controller = jumpstarter_pb2_grpc.ControllerServiceStub(await self.channel_factory())
+        channel = await self.channel_factory()
+        try:
+            controller = jumpstarter_pb2_grpc.ControllerServiceStub(channel)
             ...
-                yield path
+                yield path
+        finally:
+            await channel.close()

111-135: Listen stream channel leak; wrap each attempt with a channel lifecycle

listen() establishes a new channel per attempt but never closes it on error/EOF. Close the channel in a finally to avoid leaks on retries.

-        async def listen(retries=5, backoff=3):
+        async def listen(retries=5, backoff=3):
             retries_left = retries
             while True:
-                try:
-                    controller = jumpstarter_pb2_grpc.ControllerServiceStub(await self.channel_factory())
+                channel = await self.channel_factory()
+                try:
+                    controller = jumpstarter_pb2_grpc.ControllerServiceStub(channel)
                     async for request in controller.Listen(jumpstarter_pb2.ListenRequest(lease_name=lease_name)):
                         await listen_tx.send(request)
                 except Exception as e:
                     if retries_left > 0:
                         retries_left -= 1
                         logger.info(
                             "Listen stream interrupted, restarting in {}s, {} retries left: {}".format(
                                 backoff, retries_left, e
                             )
                         )
                         await sleep(backoff)
                     else:
                         raise
                 else:
                     retries_left = retries
+                finally:
+                    await channel.close()

152-172: Status stream channel leak; same fix as listen()

The status() loop also creates channels per attempt without closing them. Add a finally to close the channel before retrying.

-        async def status(retries=5, backoff=3):
+        async def status(retries=5, backoff=3):
             retries_left = retries
             while True:
-                try:
-                    controller = jumpstarter_pb2_grpc.ControllerServiceStub(await self.channel_factory())
+                channel = await self.channel_factory()
+                try:
+                    controller = jumpstarter_pb2_grpc.ControllerServiceStub(channel)
                     async for status in controller.Status(jumpstarter_pb2.StatusRequest()):
                         await status_tx.send(status)
                 except Exception as e:
                     if retries_left > 0:
                         retries_left -= 1
                         logger.info(
                             "Status stream interrupted, restarting in {}s, {} retries left: {}".format(
                                 backoff, retries_left, e
                             )
                         )
                         await sleep(backoff)
                     else:
                         raise
                 else:
                     retries_left = retries
+                finally:
+                    await channel.close()
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)

107-113: CLI does not propagate exit code to the OS

click commands ignore return values by default; returning an int won’t set the process’ exit status. The PR goal is to propagate the child’s exit code back to the parent/CLI. Wrap the result with sys.exit(...) (or raise click.Exit(code)).

Apply this diff:

 def run(config):
     """Run an exporter locally."""
-    return _serve_with_exc_handling(config)
+    sys.exit(_serve_with_exc_handling(config))
♻️ Duplicate comments (1)
packages/jumpstarter-cli/jumpstarter_cli/run.py (1)

79-92: SIGHUP should trigger restart; parent exit-code mapping currently terminates instead

Per the PR objective, SIGHUP must restart the child while keeping the parent running. Right now, the child returns 1 on SIGHUP and the parent maps any non‑zero exit code to 128 + code, resulting in termination with 129 instead of a restart.

Proposed fix: treat exit code 0 and SIGHUP as “restart,” and also handle the WIFSIGNALED case for completeness.

Is the intended behavior to always restart on SIGHUP regardless of whether the child exits normally or is killed by the signal? If yes, apply:

@@
-    if os.WIFEXITED(status):
-        # Interpret child exit code
-        child_exit_code = os.WEXITSTATUS(status)
-        if child_exit_code == 0:
-            return None  # restart child (unexpected exit/exception)
-        else:
-            # Child indicates termination (signal number)
-            return 128 + child_exit_code  # Return standard Unix exit code
-    else:
-        # Child killed by unhandled signal - terminate
-        child_exit_signal = os.WTERMSIG(status) if os.WIFSIGNALED(status) else 0
-        click.echo(f"Child killed by unhandled signal: {child_exit_signal}", err=True)
-        return 128 + child_exit_signal
+    if os.WIFEXITED(status):
+        # Child encodes: 0 -> restart, signal number -> termination (unless SIGHUP)
+        child_exit_code = os.WEXITSTATUS(status)
+        if child_exit_code in (0, signal.SIGHUP):
+            return None  # restart child
+        return 128 + child_exit_code  # standard Unix mapping for other signals
+    elif os.WIFSIGNALED(status):
+        sig = os.WTERMSIG(status)
+        if sig == signal.SIGHUP:
+            return None  # restart child on SIGHUP
+        click.echo(f"Child killed by unhandled signal: {sig} ({signal.Signals(sig).name})", err=True)
+        return 128 + sig
+    else:
+        click.echo("Child terminated unexpectedly.", err=True)
+        return 1
🧹 Nitpick comments (35)
packages/jumpstarter/jumpstarter/common/ipaddr.py (1)

29-47: Add return type, handle missing binary, and improve error messages

The async implementation looks good. Two improvements will make it more robust:

  • Return type annotation clarifies the API.
  • Catch FileNotFoundError to surface a friendlier error when minikube isn’t installed or not in PATH.
  • Include exit code and trimmed stderr in the RuntimeError for easier troubleshooting.

Apply this diff:

-async def get_minikube_ip(profile: str = None, minikube: str = "minikube"):
+async def get_minikube_ip(profile: str | None = None, minikube: str = "minikube") -> str:
-    # Create the subprocess with optional profile
-    cmd = [minikube, "ip"]
-    if profile:
-        cmd.extend(["-p", profile])
-
-    process = await asyncio.create_subprocess_exec(*cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE)
+    # Create the subprocess with optional profile
+    cmd = [minikube, "ip"]
+    if profile:
+        cmd.extend(["-p", profile])
+
+    try:
+        process = await asyncio.create_subprocess_exec(
+            *cmd,
+            stdout=asyncio.subprocess.PIPE,
+            stderr=asyncio.subprocess.PIPE,
+        )
+    except FileNotFoundError as e:
+        raise RuntimeError(f"{minikube!r} not found in PATH") from e
 
     # Wait for it to complete and get the output
-    stdout, stderr = await process.communicate()
+    stdout, stderr = await process.communicate()
 
     # Decode and strip whitespace
-    result = stdout.decode().strip()
+    result = stdout.decode().strip()
 
     # Optional: check if command was successful
-    if process.returncode != 0:
-        raise RuntimeError(stderr.decode())
+    if process.returncode != 0:
+        err = stderr.decode().strip() or "unknown error"
+        raise RuntimeError(f"{minikube} ip failed (exit {process.returncode}): {err}")
 
     return result
packages/jumpstarter/jumpstarter/common/ipaddr_test.py (2)

22-27: Prefer asyncio.subprocess.PIPE over magic -1 values in assertions

Using constants instead of raw -1 improves clarity and resilience across Python versions.

Apply this diff:

@@
-from unittest.mock import AsyncMock, patch
+from unittest.mock import AsyncMock, patch
@@
-import pytest
+import pytest
+import asyncio
@@
-        mock_subprocess.assert_called_once_with(
-            "minikube",
-            "ip",
-            stdout=-1,
-            stderr=-1,  # asyncio.subprocess.PIPE constants
-        )
+        mock_subprocess.assert_called_once_with(
+            "minikube",
+            "ip",
+            stdout=asyncio.subprocess.PIPE,
+            stderr=asyncio.subprocess.PIPE,
+        )
@@
-        mock_subprocess.assert_called_once_with("minikube", "ip", "-p", "test-profile", stdout=-1, stderr=-1)
+        mock_subprocess.assert_called_once_with(
+            "minikube", "ip", "-p", "test-profile",
+            stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
+        )
@@
-        mock_subprocess.assert_called_once_with("custom-minikube", "ip", stdout=-1, stderr=-1)
+        mock_subprocess.assert_called_once_with(
+            "custom-minikube", "ip",
+            stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
+        )
@@
-        mock_subprocess.assert_called_once_with("my-minikube", "ip", "-p", "my-profile", stdout=-1, stderr=-1)
+        mock_subprocess.assert_called_once_with(
+            "my-minikube", "ip", "-p", "my-profile",
+            stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
+        )

Also applies to: 40-41, 53-54, 77-77


55-65: Add test for missing minikube binary (FileNotFoundError path)

If the binary is missing, the function should raise a clear RuntimeError. This test covers that branch and complements the improved error handling.

Apply this diff:

@@
     async def test_get_minikube_ip_failure(self, mock_subprocess):
@@
         with pytest.raises(RuntimeError, match="error: cluster not found"):
             await get_minikube_ip()
+
+    @pytest.mark.asyncio
+    @patch("asyncio.create_subprocess_exec")
+    async def test_get_minikube_ip_not_installed(self, mock_subprocess):
+        mock_subprocess.side_effect = FileNotFoundError()
+        with pytest.raises(RuntimeError, match="not found in PATH"):
+            await get_minikube_ip()
docs/source/getting-started/installation/service/service-local.md (3)

20-22: Fix typo and clarify wording in IP detection warning

Small grammar fix and clearer guidance.

Apply this diff:

-```{warning}
-Sometimes the automatic IP address detection for will not work correctly, to check if Jumpstarter can determine your IP address, run `jmp admin ip`. If the IP address cannot be determined, use the `--ip` argument to manually set your IP address.
-```
+```{warning}
+Sometimes automatic IP address detection does not work. To check if Jumpstarter can determine your IP address, run `jmp admin ip`. If it cannot, use the `--ip` argument to set your IP address explicitly.
+```

28-34: Avoid emphasis-as-heading (MD036) and tighten the minikube tip

Converting emphasis to headings resolves the linter warning and improves structure. The tip can be one concise sentence.

Apply this diff:

-[**kind**](https://kind.sigs.k8s.io/docs/user/quick-start/) (Kubernetes in Docker) is a tool for running local Kubernetes clusters using Docker or Podman containerized "nodes". It's lightweight and fast to start, making it excellent for CI/CD pipelines and quick local testing.
+#### kind
+[kind](https://kind.sigs.k8s.io/docs/user/quick-start/) runs local Kubernetes clusters using Docker or Podman containerized "nodes". It's lightweight and fast to start, making it excellent for CI/CD pipelines and quick local testing.
 
-[**minikube**](https://minikube.sigs.k8s.io/docs/start/) runs local Kubernetes clusters using VMs or container "nodes". It works across several platforms and supports different hypervisors, making it ideal for local development and testing. It's particularly useful in environments requiring untrusted certificates.
+#### minikube
+[minikube](https://minikube.sigs.k8s.io/docs/start/) runs local Kubernetes clusters using VMs or container "nodes". It works across platforms and supports different hypervisors.
 
 ```{tip}
-Consider minikube for environments requiring [untrusted certificates](https://minikube.sigs.k8s.io/docs/handbook/untrusted_certs/).
+In environments with untrusted certificates, minikube may be simpler to configure. See the [minikube guide](https://minikube.sigs.k8s.io/docs/handbook/untrusted_certs/).

---

`176-217`: **Double-check NodePort ranges vs. chart ports**

The minikube example sets the API server NodePort range to 8000-9000 and configures the chart NodePorts to 8082/8083, which is consistent. The kind example uses a wider NodePort range (3000-32767) and maps host ports 8082/8083 to NodePorts 30010/30011 in the kind config. Consider adding a short note to avoid confusion when switching between kind and minikube.




Would you like me to add a short note clarifying:
- kind uses NodePorts 30010/30011 (mapped to host 8082/8083),
- minikube uses NodePorts 8082/8083 directly, after expanding the range to 8000-9000?

</blockquote></details>
<details>
<summary>docs/source/getting-started/installation/service/service-production.md (3)</summary><blockquote>

`36-45`: **Minor grammar and clarity improvements in TLS options**

Tighten wording and remove sentence fragments.



Apply this diff:

```diff
-**Option 1: TLS Termination at Ingress/Route (Recommended)**
-- Terminate TLS at the ingress controller or OpenShift route
-- Simpler certificate management
-- Better performance with fewer encryption hops
+**Option 1: TLS termination at Ingress/Route (recommended)**
+- Terminate TLS at the ingress controller or OpenShift Route.
+- Simpler certificate management.
+- Fewer encryption hops.
 
-**Option 2: End-to-End TLS**
-- TLS from client to Jumpstarter service
-- Higher security but more complex certificate management
-- Required for strict compliance environments
+**Option 2: End-to-end TLS**
+- TLS from client to the Jumpstarter service.
+- Higher security but more complex certificate management.
+- Required for strict compliance environments.

25-31: Add ingress annotations example for gRPC on common controllers

Production issues often trace back to missing gRPC annotations. Consider adding a quick example for NGINX Ingress to minimize support load.

Proposed addition (right after the “gRPC Requirements” bullets):

+For NGINX Ingress, add:
+
+```yaml
+metadata:
+  annotations:
+    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
+```
+
+For other ingress controllers, consult their gRPC guides for the equivalent settings.

77-105: Recommend using Secrets instead of inlining certificates in Route

Inlined certs/keys in manifests can leak via Git history. Prefer referencing a Secret.

Proposed alternative snippet:

apiVersion: v1
kind: Secret
metadata:
  name: jumpstarter-grpc-tls
type: kubernetes.io/tls
data:
  tls.crt: <base64 PEM>
  tls.key: <base64 KEY>
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: jumpstarter-grpc
spec:
  host: grpc.jumpstarter.example.com
  tls:
    termination: edge
    certificate: null
    key: null
  to:
    kind: Service
    name: jumpstarter-controller-grpc

If you want, I can add a short section explaining how to create the Secret from PEM files.

packages/jumpstarter-kubernetes/jumpstarter_kubernetes/cluster.py (2)

28-35: Consider streaming stderr on failures to surface errors

run_command_with_output drops stderr/stdout in the exception path, making triage harder. Optional: add a variant that captures stderr on non-zero exit and includes it in the raised message.

If you want, I can add run_command_with_diagnostics(cmd: list[str]) -> None that:

  • streams output live, and
  • on non-zero exit, also returns captured tail of stderr (e.g., last 200 lines) to include in the error.

93-128: Minikube create path is reasonable; consider a timeout as a future improvement

The flow (exists -> optionally delete -> start with explicit NodePort range) is sound. If long-running CI tasks are a concern, a future enhancement could allow passing a timeout to abort hung starts.

If you see occasional hangs in CI, I can wire a timeout: Optional[float] parameter and propagate it to run_command_with_output.

packages/jumpstarter-kubernetes/jumpstarter_kubernetes/cluster_test.py (4)

178-202: Assert Kind config is piped via stdin to prevent silent regressions

The Kind creation flow depends on passing a config via --config=/dev/stdin. Strengthen the test to assert this invariant so a future change (e.g., dropping stdin piping) doesn’t pass unnoticed.

Apply this diff inside the current test to assert the flag:

         args, kwargs = mock_subprocess.call_args
         assert args[0] == "kind"
         assert args[1] == "create"
         assert args[2] == "cluster"
         assert "--name" in args
         assert "test-cluster" in args
+        assert "--config=/dev/stdin" in args
         assert kwargs["stdin"] == asyncio.subprocess.PIPE

247-261: Cover Kind creation with non-zero return code (failure path)

We exercise success, not-installed, already-exists, and force-recreate paths, but not the failure path (non-zero return code). Add a test to ensure a non-zero return code from the Kind process raises the expected RuntimeError.

Example test to append:

+    @pytest.mark.asyncio
+    @patch("jumpstarter_kubernetes.cluster.kind_installed")
+    @patch("jumpstarter_kubernetes.cluster.kind_cluster_exists")
+    @patch("asyncio.create_subprocess_exec")
+    async def test_create_kind_cluster_failure(self, mock_subprocess, mock_cluster_exists, mock_kind_installed):
+        mock_kind_installed.return_value = True
+        mock_cluster_exists.return_value = False
+        mock_process = AsyncMock()
+        mock_process.returncode = 1
+        mock_process.communicate.return_value = (b"", b"boom")
+        mock_subprocess.return_value = mock_process
+
+        with pytest.raises(RuntimeError, match="Failed to create Kind cluster 'test-cluster'"):
+            await create_kind_cluster("kind", "test-cluster")

262-275: Exercise Kind deletion failure path

We test success, not installed, and already deleted for Kind. Add a failure test mirroring the Minikube failure test to pin the error contract.

Example test to append:

+    @pytest.mark.asyncio
+    @patch("jumpstarter_kubernetes.cluster.kind_installed")
+    @patch("jumpstarter_kubernetes.cluster.kind_cluster_exists")
+    @patch("jumpstarter_kubernetes.cluster.run_command_with_output")
+    async def test_delete_kind_cluster_failure(self, mock_run_command, mock_cluster_exists, mock_kind_installed):
+        mock_kind_installed.return_value = True
+        mock_cluster_exists.return_value = True
+        mock_run_command.return_value = 1
+
+        with pytest.raises(RuntimeError, match="Failed to delete Kind cluster 'test-cluster'"):
+            await delete_kind_cluster("kind", "test-cluster")

110-172: Add a Minikube exists “false” branch test for parity with Kind

We have the true-path and “not installed” tests for Minikube, but not the case where the command returns non-zero (cluster absent). Add a simple “false” test to keep parity with the Kind suite.

Example test to append:

+    @pytest.mark.asyncio
+    @patch("jumpstarter_kubernetes.cluster.minikube_installed")
+    @patch("jumpstarter_kubernetes.cluster.run_command")
+    async def test_minikube_cluster_exists_false(self, mock_run_command, mock_minikube_installed):
+        mock_minikube_installed.return_value = True
+        mock_run_command.return_value = (1, "", "not running")
+        result = await minikube_cluster_exists("minikube", "test-cluster")
+        assert result is False
packages/jumpstarter-cli-admin/jumpstarter_cli_admin/install_test.py (3)

287-333: Consider adding a focused test for the ip CLI command

We test get_ip_generic directly, but not the ip CLI wrapper. A small test would validate option wiring and echo behavior.

Example to append:

+    def test_ip_command_help(self):
+        runner = CliRunner()
+        result = runner.invoke(ip, ["--help"])
+        assert result.exit_code == 0
+        assert "--cluster-name" in result.output
+
+    @patch("jumpstarter_cli_admin.install._validate_cluster_type")
+    @patch("jumpstarter_cli_admin.install.get_ip_generic")
+    def test_ip_command_kind(self, mock_get_ip_generic, mock_validate_cluster_type):
+        mock_validate_cluster_type.return_value = "kind"
+        mock_get_ip_generic.return_value = "192.168.1.10"
+        runner = CliRunner()
+        result = runner.invoke(ip, ["--kind", "kind"])
+        assert result.exit_code == 0
+        assert "192.168.1.10" in result.output

252-265: Add negative test for Minikube IP when binary is missing

There’s no test asserting we raise ClickException when Minikube is requested but not installed. This guards the user experience.

Example to append:

+    @pytest.mark.asyncio
+    @patch("jumpstarter_cli_admin.install.minikube_installed")
+    async def test_get_ip_generic_minikube_not_installed(self, mock_minikube_installed):
+        mock_minikube_installed.return_value = False
+        with pytest.raises(click.ClickException, match="minikube is not installed"):
+            await get_ip_generic("minikube", "minikube", "test-cluster")

607-617: Uninstall happy-path is covered; consider asserting user output for UX stability

Optionally assert on key echo lines (e.g., “Uninstalling Jumpstarter service…”, “Uninstalled Helm release…”) to lock-in UX strings.

packages/jumpstarter-cli-admin/jumpstarter_cli_admin/install.py (8)

102-110: Respect quoted/escaped extra args using shlex.split

Plain .split() breaks when users pass quoted values (e.g., --arg="foo bar"). shlex.split handles shell-like quoting and is the standard for CLI arg parsing.

Apply this diff and import shlex:

@@
-    extra_args_list = kind_extra_args.split() if kind_extra_args.strip() else []
+    import shlex
+    extra_args_list = shlex.split(kind_extra_args) if kind_extra_args.strip() else []

Do the same in _create_minikube_cluster below.


127-135: Mirror shlex.split for Minikube extra args

Keep parsing behavior consistent across both cluster flows.

-    extra_args_list = minikube_extra_args.split() if minikube_extra_args.strip() else []
+    import shlex
+    extra_args_list = shlex.split(minikube_extra_args) if minikube_extra_args.strip() else []

202-211: Fix user-facing typo: “gPRC Mode” -> “gRPC Mode”

Minor UX polish in the install output.

-    click.echo(f"gPRC Mode: {mode}\n")
+    click.echo(f"gRPC Mode: {mode}\n")

255-273: Clarify --kind/--minikube options: they take an optional binary path, not flags

The current declaration uses is_flag=False with flag_value which is slightly confusing and redundant. Declaring them as optional string options improves clarity and help text. Behavior remains the same and aligns with how you compute kind or "kind".

-@click.option(
-    "--kind", is_flag=False, flag_value="kind", default=None, help="Use default settings for a local Kind cluster"
-)
-@click.option(
-    "--minikube",
-    is_flag=False,
-    flag_value="minikube",
-    default=None,
-    help="Use default settings for a local Minikube cluster",
-)
+@click.option(
+    "--kind",
+    type=str,
+    default=None,
+    metavar="PATH",
+    help='Enable Kind workflow; optionally set the Kind binary path (default: "kind")',
+)
+@click.option(
+    "--minikube",
+    type=str,
+    default=None,
+    metavar="PATH",
+    help='Enable Minikube workflow; optionally set the Minikube binary path (default: "minikube")',
+)

Note: This is aligned with the preference for straightforward, inline code (no extra helpers). It doesn’t change downstream logic in _validate_cluster_type or in how you pass kind/minikube to helpers.


328-336: Apply the same option clarification to the ip command

Keep CLI consistency across commands.

-@click.option(
-    "--kind", is_flag=False, flag_value="kind", default=None, help="Use default settings for a local Kind cluster"
-)
-@click.option(
-    "--minikube",
-    is_flag=False,
-    flag_value="minikube",
-    default=None,
-    help="Use default settings for a local Minikube cluster",
-)
+@click.option(
+    "--kind",
+    type=str,
+    default=None,
+    metavar="PATH",
+    help='Use Kind workflow; optionally set the Kind binary path (default: "kind")',
+)
+@click.option(
+    "--minikube",
+    type=str,
+    default=None,
+    metavar="PATH",
+    help='Use Minikube workflow; optionally set the Minikube binary path (default: "minikube")',
+)

359-367: Apply the same option clarification to the uninstall command

For consistency and clearer UX.

-@click.option(
-    "--kind", is_flag=False, flag_value="kind", default=None, help="Delete the local Kind cluster after uninstalling"
-)
-@click.option(
-    "--minikube",
-    is_flag=False,
-    flag_value="minikube",
-    default=None,
-    help="Delete the local Minikube cluster after uninstalling",
-)
+@click.option(
+    "--kind",
+    type=str,
+    default=None,
+    metavar="PATH",
+    help='Delete the local Kind cluster after uninstalling; optionally set the Kind binary path',
+)
+@click.option(
+    "--minikube",
+    type=str,
+    default=None,
+    metavar="PATH",
+    help='Delete the local Minikube cluster after uninstalling; optionally set the Minikube binary path',
+)

188-216: Optional: wrap Helm install failures into ClickException for cleaner CLI UX

Today, any exception bubbles up. Converting to ClickException preserves traceback suppression and gives a friendly error. Tests that patch install_helm_chart to raise RuntimeError will still pass if you re-raise as ClickException.

Example:

-    await install_helm_chart(
-        chart, name, namespace, basedomain, grpc_endpoint, router_endpoint, mode, version, kubeconfig, context, helm
-    )
+    try:
+        await install_helm_chart(
+            chart,
+            name,
+            namespace,
+            basedomain,
+            grpc_endpoint,
+            router_endpoint,
+            mode,
+            version,
+            kubeconfig,
+            context,
+            helm,
+        )
+    except Exception as e:
+        raise click.ClickException(f"Helm installation failed: {e}") from e

218-231: get_ip_generic logic is clear; minor naming nit

The parameter minikube represents the binary path. Consider renaming to minikube_binary for readability (no functional change).

packages/jumpstarter-cli/jumpstarter_cli/config.py (1)

9-11: Docstring addition aligns CLI help; minor style nit

Good to see the group help surfaced via the docstring. Consider a concise one-line summary ending with a period to match PEP 257 and Click’s short-help extraction.

Apply this minimal tweak:

-    """
-    Manage local configurations
-    """
+    """Manage local configurations."""
packages/jumpstarter/jumpstarter/client/grpc.py (3)

83-90: Exporter model aligned with proto; consider mapping to a plain dict

from_protobuf passes data.labels (a map field) directly. Pydantic handles mappings fine, but if you foresee serialization quirks, consider dict(data.labels) for a plain dict cast. Not required.

-        return cls(namespace=namespace, name=name, labels=data.labels, online=data.online)
+        return cls(namespace=namespace, name=name, labels=dict(data.labels), online=data.online)

95-107: Lease display extraction is correct; handle tz-aware timestamps defensively

Formatting effective_begin_time via strftime is fine; if there’s any chance of tz-naive datetimes slipping in, guard with .astimezone() before formatting. Optional.

-            if self.lease.effective_begin_time:
-                start_time = self.lease.effective_begin_time.strftime("%Y-%m-%d %H:%M:%S")
+            if self.lease.effective_begin_time:
+                dt = self.lease.effective_begin_time
+                if dt.tzinfo is None:
+                    dt = dt.astimezone()
+                start_time = dt.strftime("%Y-%m-%d %H:%M:%S")

221-249: De-duplicate exclude-fields logic between model_dump_json and model_dump

Both methods build the same exclude set. Factor it into a small helper to keep behavior in sync and reduce drift.

Apply this diff within the changed segments, and add the helper below (outside the selected range):

-        # Determine which fields to exclude
-        exclude_fields = set()
-        if not self.include_leases:
-            exclude_fields.add("lease")
-        if not self.include_online:
-            exclude_fields.add("online")
+        exclude_fields = self._exclude_fields()

         data = {
             "exporters": [
-                exporter.model_dump(mode="json", exclude=exclude_fields) for exporter in self.exporters
+                exporter.model_dump(mode="json", exclude=exclude_fields) for exporter in self.exporters
             ]
         }
         return json.dumps(data, **json_kwargs)
-        exclude_fields = set()
-        if not self.include_leases:
-            exclude_fields.add("lease")
-        if not self.include_online:
-            exclude_fields.add("online")
+        exclude_fields = self._exclude_fields()

         return {
             "exporters": [
                 exporter.model_dump(mode="json", exclude=exclude_fields) for exporter in self.exporters
             ]
         }

And add this helper in the class:

def _exclude_fields(self) -> set[str]:
    exclude_fields: set[str] = set()
    if not self.include_leases:
        exclude_fields.add("lease")
    if not self.include_online:
        exclude_fields.add("online")
    return exclude_fields
packages/jumpstarter/jumpstarter/client/grpc_test.py (2)

172-179: Positive assertion on column count; consider explicit header assertions

You already assert counts; optionally assert concrete headers to catch regressions in column order.

-        assert len(table.columns) == 5  # NAME, LABELS, LEASED BY, LEASE STATUS, START TIME
+        headers = [c.header for c in table.columns]
+        assert headers == ["NAME", "LABELS", "LEASED BY", "LEASE STATUS", "START TIME"]

221-258: Online indicator assertions are good; add JSON gating tests for include_ flags*

To fully exercise ExporterList serialization toggles, add tests that verify online/lease are omitted unless opted in.

Proposed additions:

def test_exporter_list_model_dump_json_gating():
    exporters = [
        Exporter(namespace="ns", name="e1", labels={}, online=True),
    ]
    el = SimpleNamespace()  # stand-in for ExporterList-like behavior

Or, if you prefer a concrete test within this suite:

from jumpstarter.client.grpc import ExporterList

def test_exporter_list_model_dump_json_gating():
    el = ExporterList(exporters=[Exporter(namespace="ns", name="e1", labels={"k":"v"}, online=True)])
    # Defaults: both fields excluded
    data = json.loads(el.model_dump_json())
    assert "online" not in data["exporters"][0]
    assert "lease" not in data["exporters"][0]

    # Enable online only
    el.include_online = True
    el.include_leases = False
    data = json.loads(el.model_dump_json())
    assert data["exporters"][0]["online"] is True
    assert "lease" not in data["exporters"][0]

    # Enable both
    el.exporters[0].lease = Mock(spec=Lease)
    el.include_leases = True
    data = json.loads(el.model_dump_json())
    assert "lease" in data["exporters"][0]
packages/jumpstarter/jumpstarter/exporter/exporter.py (1)

77-84: Log level on connection failures

__handle logs connection failures at info level. If this is routine noise during shutdown or lease changes, fine. Otherwise, consider warning or include exc_info to aid debugging.

-            logger.info("failed to handle connection: {}".format(e))
+            logger.warning("failed to handle connection: %s", e, exc_info=True)
packages/jumpstarter-cli/jumpstarter_cli/run.py (2)

22-36: Signal-loop control: break after terminal signals to avoid unnecessary wakeups

After a non‑SIGHUP (terminal) signal, you mark signal_handled = True and continue looping, which keeps the receiver alive to ignore future signals. Breaking out simplifies control flow and avoids extra wakeups while shutdown is ongoing. SIGHUP should remain non-terminal to allow later SIGTERM to override.

Apply this small refinement:

                 async for sig in signals:
                     if signal_handled:
-                        continue  # Ignore duplicate signals
+                        break  # Stop listening once a terminating signal was processed
@@
                     if exporter:
                         # Terminate exporter. SIGHUP waits until current lease is let go. Later SIGTERM still overrides
                         if received_signal != signal.SIGHUP:
                             signal_handled = True
+                            break
                         exporter.stop(wait_for_lease_exit=received_signal == signal.SIGHUP)

47-55: Redundant cancellation check inside ExceptionGroup handler

except* Exception cannot contain anyio’s cancellation exception (it’s a BaseException subclass). The isinstance(exc, anyio.get_cancelled_exc_class()) check is therefore always False here. Consider simplifying to log all leaf exceptions.

Apply:

-                    for exc in leaf_exceptions(excgroup):
-                        if not isinstance(exc, anyio.get_cancelled_exc_class()):
-                            click.echo(
-                                f"Exception while serving on the exporter: {type(exc).__name__}: {exc}",
-                                err=True,
-                            )
+                    for exc in leaf_exceptions(excgroup):
+                        click.echo(
+                            f"Exception while serving on the exporter: {type(exc).__name__}: {exc}",
+                            err=True,
+                        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b923cbe and f144c17.

📒 Files selected for processing (27)
  • docs/source/getting-started/guides/setup-distributed-mode.md (1 hunks)
  • docs/source/getting-started/installation/index.md (1 hunks)
  • docs/source/getting-started/installation/service.md (0 hunks)
  • docs/source/getting-started/installation/service/index.md (1 hunks)
  • docs/source/getting-started/installation/service/service-local.md (1 hunks)
  • docs/source/getting-started/installation/service/service-production.md (1 hunks)
  • packages/jumpstarter-cli-admin/jumpstarter_cli_admin/delete.py (1 hunks)
  • packages/jumpstarter-cli-admin/jumpstarter_cli_admin/install.py (4 hunks)
  • packages/jumpstarter-cli-admin/jumpstarter_cli_admin/install_test.py (1 hunks)
  • packages/jumpstarter-cli-common/jumpstarter_cli_common/opt.py (1 hunks)
  • packages/jumpstarter-cli/jumpstarter_cli/config.py (1 hunks)
  • packages/jumpstarter-cli/jumpstarter_cli/get.py (2 hunks)
  • packages/jumpstarter-cli/jumpstarter_cli/get_test.py (1 hunks)
  • packages/jumpstarter-cli/jumpstarter_cli/run.py (1 hunks)
  • packages/jumpstarter-kubernetes/jumpstarter_kubernetes/__init__.py (2 hunks)
  • packages/jumpstarter-kubernetes/jumpstarter_kubernetes/cluster.py (1 hunks)
  • packages/jumpstarter-kubernetes/jumpstarter_kubernetes/cluster_test.py (1 hunks)
  • packages/jumpstarter-kubernetes/jumpstarter_kubernetes/install.py (1 hunks)
  • packages/jumpstarter-protocol/jumpstarter_protocol/jumpstarter/client/v1/client_pb2.py (3 hunks)
  • packages/jumpstarter/jumpstarter/client/grpc.py (4 hunks)
  • packages/jumpstarter/jumpstarter/client/grpc_test.py (1 hunks)
  • packages/jumpstarter/jumpstarter/common/ipaddr.py (1 hunks)
  • packages/jumpstarter/jumpstarter/common/ipaddr_test.py (1 hunks)
  • packages/jumpstarter/jumpstarter/config/client.py (5 hunks)
  • packages/jumpstarter/jumpstarter/config/exporter.py (1 hunks)
  • packages/jumpstarter/jumpstarter/exporter/exporter.py (4 hunks)
  • packages/jumpstarter/jumpstarter/exporter/session.py (1 hunks)
💤 Files with no reviewable changes (1)
  • docs/source/getting-started/installation/service.md
✅ Files skipped from review due to trivial changes (5)
  • docs/source/getting-started/guides/setup-distributed-mode.md
  • docs/source/getting-started/installation/service/index.md
  • packages/jumpstarter-kubernetes/jumpstarter_kubernetes/install.py
  • docs/source/getting-started/installation/index.md
  • packages/jumpstarter-cli-admin/jumpstarter_cli_admin/delete.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/jumpstarter/jumpstarter/exporter/session.py
  • packages/jumpstarter/jumpstarter/config/exporter.py
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#584
File: packages/jumpstarter/jumpstarter/client/grpc.py:38-54
Timestamp: 2025-08-18T07:13:37.619Z
Learning: User michalskrivanek prefers inline code over small helper function extractions when the abstraction doesn't significantly improve readability or reduce complexity. They value straightforward, readable code over reducing minor duplication.
📚 Learning: 2025-05-28T15:09:35.768Z
Learnt from: NickCao
PR: jumpstarter-dev/jumpstarter#463
File: packages/jumpstarter-cli-admin/jumpstarter_cli_admin/get_test.py:270-270
Timestamp: 2025-05-28T15:09:35.768Z
Learning: The jumpstarter CLI is designed to match kubectl's behavior, including returning exit code 0 (success) when no resources are found, rather than exit code 1 (failure).

Applied to files:

  • docs/source/getting-started/installation/service/service-local.md
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
  • packages/jumpstarter/jumpstarter/exporter/exporter.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO 4.6.2+, `open_signal_receiver()` is a regular synchronous context manager that should be used with a plain `with` statement, not `async with`. However, it returns an asynchronous iterator that must be consumed with `async for`. The correct usage pattern is: `with open_signal_receiver(...) as signals: async for signum in signals:`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO, `open_signal_receiver()` returns a synchronous context manager (used with regular `with`), not an asynchronous one. While the context manager is synchronous, it yields an asynchronous iterator that should be consumed with `async for`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-08-14T13:20:51.953Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:15-15
Timestamp: 2025-08-14T13:20:51.953Z
Learning: Signal 0 is not a valid receivable signal in Unix systems. It can only be used as an argument to kill(pid, 0) to test if a process exists, but doesn't actually send a signal. Receivable signals are positive integers (e.g., SIGINT=2, SIGTERM=15, SIGHUP=1, SIGQUIT=3). Using 0 to represent "no signal received" in signal handling code is correct.

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code graph analysis (13)
packages/jumpstarter-kubernetes/jumpstarter_kubernetes/cluster_test.py (1)
packages/jumpstarter-kubernetes/jumpstarter_kubernetes/cluster.py (9)
  • create_kind_cluster (130-192)
  • create_minikube_cluster (93-127)
  • delete_kind_cluster (77-90)
  • delete_minikube_cluster (61-74)
  • kind_cluster_exists (49-58)
  • kind_installed (11-13)
  • minikube_cluster_exists (37-46)
  • minikube_installed (6-8)
  • run_command_with_output (28-34)
packages/jumpstarter-cli/jumpstarter_cli/get_test.py (3)
packages/jumpstarter-cli-common/jumpstarter_cli_common/opt.py (1)
  • parse_comma_separated (156-196)
packages/jumpstarter/jumpstarter/client/grpc.py (1)
  • ExporterList (195-249)
packages/jumpstarter/jumpstarter/config/client.py (2)
  • ClientConfigV1Alpha1 (89-370)
  • lease (123-131)
packages/jumpstarter-cli-admin/jumpstarter_cli_admin/install_test.py (1)
packages/jumpstarter-cli-admin/jumpstarter_cli_admin/install.py (13)
  • install (277-323)
  • _configure_endpoints (42-60)
  • _create_kind_cluster (96-118)
  • _create_minikube_cluster (121-143)
  • _delete_kind_cluster (146-155)
  • _delete_minikube_cluster (158-167)
  • _handle_cluster_creation (63-93)
  • _handle_cluster_deletion (170-185)
  • _validate_cluster_type (29-39)
  • _validate_prerequisites (22-26)
  • get_ip_generic (218-231)
  • uninstall (372-393)
  • ip (339-348)
packages/jumpstarter/jumpstarter/common/ipaddr_test.py (1)
packages/jumpstarter/jumpstarter/common/ipaddr.py (1)
  • get_minikube_ip (29-47)
packages/jumpstarter-cli/jumpstarter_cli/get.py (3)
packages/jumpstarter-cli-common/jumpstarter_cli_common/opt.py (1)
  • opt_comma_separated (199-234)
packages/jumpstarter-cli/jumpstarter_cli/login.py (1)
  • relogin_client (141-155)
packages/jumpstarter/jumpstarter/config/client.py (1)
  • list_exporters (144-183)
packages/jumpstarter/jumpstarter/client/grpc_test.py (1)
packages/jumpstarter/jumpstarter/client/grpc.py (4)
  • WithOptions (20-22)
  • add_display_columns (25-35)
  • add_exporter_row (38-53)
  • get_status (178-192)
packages/jumpstarter-cli/jumpstarter_cli/run.py (5)
packages/jumpstarter-cli-common/jumpstarter_cli_common/config.py (1)
  • opt_config (99-100)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (2)
  • handle_exceptions (32-46)
  • leaf_exceptions (78-102)
packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py (1)
  • signal_handler (9-22)
packages/jumpstarter/jumpstarter/exporter/exporter.py (3)
  • stop (36-49)
  • serve (143-193)
  • status (152-171)
packages/jumpstarter/jumpstarter/config/exporter.py (2)
  • create_exporter (160-193)
  • serve (195-197)
packages/jumpstarter-kubernetes/jumpstarter_kubernetes/__init__.py (3)
packages/jumpstarter-kubernetes/jumpstarter_kubernetes/cluster.py (6)
  • create_kind_cluster (130-192)
  • create_minikube_cluster (93-127)
  • delete_kind_cluster (77-90)
  • delete_minikube_cluster (61-74)
  • kind_installed (11-13)
  • minikube_installed (6-8)
packages/jumpstarter-cli-admin/jumpstarter_cli_admin/install.py (1)
  • install (277-323)
packages/jumpstarter-kubernetes/jumpstarter_kubernetes/install.py (3)
  • helm_installed (6-8)
  • install_helm_chart (11-66)
  • uninstall_helm_chart (69-91)
packages/jumpstarter-cli-common/jumpstarter_cli_common/opt.py (1)
packages/jumpstarter-cli/jumpstarter_cli/get_test.py (1)
  • allowed_values (15-17)
packages/jumpstarter/jumpstarter/config/client.py (1)
packages/jumpstarter/jumpstarter/client/grpc.py (3)
  • ClientService (276-389)
  • Exporter (79-109)
  • ListExporters (293-309)
packages/jumpstarter/jumpstarter/exporter/exporter.py (3)
packages/jumpstarter/jumpstarter/config/exporter.py (1)
  • channel_factory (167-174)
packages/jumpstarter-protocol/jumpstarter_protocol/jumpstarter/v1/jumpstarter_pb2_grpc.py (1)
  • ControllerServiceStub (9-68)
packages/jumpstarter/conftest.py (1)
  • Unregister (59-60)
packages/jumpstarter-cli-admin/jumpstarter_cli_admin/install.py (3)
packages/jumpstarter-kubernetes/jumpstarter_kubernetes/cluster.py (6)
  • create_kind_cluster (130-192)
  • create_minikube_cluster (93-127)
  • delete_kind_cluster (77-90)
  • delete_minikube_cluster (61-74)
  • kind_installed (11-13)
  • minikube_installed (6-8)
packages/jumpstarter-kubernetes/jumpstarter_kubernetes/install.py (3)
  • helm_installed (6-8)
  • install_helm_chart (11-66)
  • uninstall_helm_chart (69-91)
packages/jumpstarter/jumpstarter/common/ipaddr.py (2)
  • get_ip_address (7-26)
  • get_minikube_ip (29-47)
packages/jumpstarter/jumpstarter/client/grpc.py (2)
packages/jumpstarter/jumpstarter/client/lease.py (1)
  • Lease (31-211)
packages/jumpstarter-kubernetes/jumpstarter_kubernetes/exporters.py (4)
  • rich_add_columns (62-73)
  • rich_add_columns (111-112)
  • rich_add_rows (75-97)
  • rich_add_rows (114-116)
🪛 LanguageTool
docs/source/getting-started/installation/service/service-local.md

[grammar] ~10-~10: There might be a mistake here.
Context: ...ed and configured to access your cluster - [Helm](https://helm.sh/docs/intro/install...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ...onfigured to access your cluster - Helm (version 3.x or newer) - Administrator ...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ...s/intro/install/) (version 3.x or newer) - Administrator access to your cluster (re...

(QB_NEW_EN)


[grammar] ~52-~52: There might be a mistake here.
Context: ...luster name (default: jumpstarter-lab) - --force-recreate-cluster: Force recreate the cluster if it alrea...

(QB_NEW_EN)


[grammar] ~53-~53: There might be a mistake here.
Context: ...if it already exists (destroys all data) - --kind-extra-args: Pass additional arguments to kind clus...

(QB_NEW_EN)


[grammar] ~54-~54: There might be a mistake here.
Context: ...ional arguments to kind cluster creation - --minikube-extra-args: Pass additional arguments to minikube ...

(QB_NEW_EN)


[grammar] ~113-~113: There might be a mistake here.
Context: ...admin install` command and all available options, see the [MAN pages](../../../re...

(QB_NEW_EN)

docs/source/getting-started/installation/service/service-production.md

[grammar] ~11-~11: There might be a mistake here.
Context: ...s/intro/install/) (version 3.x or newer) - Administrator access to your cluster (re...

(QB_NEW_EN)


[grammar] ~12-~12: There might be a mistake here.
Context: ... cluster (required for CRD installation) - Domain name for service endpoints - Ingr...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ...ion) - Domain name for service endpoints - Ingress controller installed (for Kubern...

(QB_NEW_EN)


[grammar] ~36-~36: There might be a mistake here.
Context: ...rmination at Ingress/Route (Recommended)** - Terminate TLS at the ingress controller ...

(QB_NEW_EN)


[grammar] ~41-~41: There might be a mistake here.
Context: ...ryption hops Option 2: End-to-End TLS - TLS from client to Jumpstarter service -...

(QB_NEW_EN)


[grammar] ~42-~42: There might be a mistake here.
Context: ...- TLS from client to Jumpstarter service - Higher security but more complex certifi...

(QB_NEW_EN)


[grammar] ~43-~43: There might be a mistake here.
Context: ... but more complex certificate management - Required for strict compliance environme...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/source/getting-started/installation/service/service-local.md

36-36: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


41-41: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: e2e

Comment on lines +180 to +192
command = [kind, "create", "cluster", "--name", cluster_name, "--config=/dev/stdin"]
command.extend(extra_args)

kind_process = await asyncio.create_subprocess_exec(
*command, stdin=asyncio.subprocess.PIPE
)

await kind_process.communicate(input=cluster_config.encode())

if kind_process.returncode == 0:
return True
else:
raise RuntimeError(f"Failed to create Kind cluster '{cluster_name}'")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid /dev/stdin for cross-platform compatibility; use a temporary file

--config=/dev/stdin will fail on Windows. Using a temp file preserves streaming output and works everywhere.

Apply this diff:

-    command = [kind, "create", "cluster", "--name", cluster_name, "--config=/dev/stdin"]
-    command.extend(extra_args)
-
-    kind_process = await asyncio.create_subprocess_exec(
-        *command, stdin=asyncio.subprocess.PIPE
-    )
-
-    await kind_process.communicate(input=cluster_config.encode())
-
-    if kind_process.returncode == 0:
-        return True
-    else:
-        raise RuntimeError(f"Failed to create Kind cluster '{cluster_name}'")
+    import os
+    import tempfile
+
+    with tempfile.NamedTemporaryFile("w", delete=False) as f:
+        f.write(cluster_config)
+        cfg_path = f.name
+    try:
+        command = [kind, "create", "cluster", "--name", cluster_name, f"--config={cfg_path}"]
+        command.extend(extra_args)
+        returncode = await run_command_with_output(command)
+        if returncode == 0:
+            return True
+        else:
+            raise RuntimeError(f"Failed to create Kind cluster '{cluster_name}'")
+    finally:
+        try:
+            os.unlink(cfg_path)
+        except OSError:
+            pass

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In packages/jumpstarter-kubernetes/jumpstarter_kubernetes/cluster.py around
lines 180-192, replace the use of --config=/dev/stdin (which breaks on Windows)
by writing cluster_config to a temporary file and passing its path to --config:
create a NamedTemporaryFile (or tempfile.mkstemp) in a try/finally, write
cluster_config.encode(), flush and close the file, build the command with
"--config", temp_path (no /dev/stdin), invoke asyncio.create_subprocess_exec
without piping stdin, await communicate(), then remove the temp file in the
finally block to ensure cleanup even on errors; ensure proper encoding when
writing and that return code handling remains the same.

Comment on lines 118 to +143
_globals['_EXPORTER']._serialized_start=338
_globals['_EXPORTER']._serialized_end=598
_globals['_EXPORTER_LABELSENTRY']._serialized_start=444
_globals['_EXPORTER_LABELSENTRY']._serialized_end=501
_globals['_LEASE']._serialized_start=601
_globals['_LEASE']._serialized_end=1478
_globals['_GETEXPORTERREQUEST']._serialized_start=1480
_globals['_GETEXPORTERREQUEST']._serialized_end=1554
_globals['_LISTEXPORTERSREQUEST']._serialized_start=1557
_globals['_LISTEXPORTERSREQUEST']._serialized_end=1736
_globals['_LISTEXPORTERSRESPONSE']._serialized_start=1738
_globals['_LISTEXPORTERSRESPONSE']._serialized_end=1864
_globals['_GETLEASEREQUEST']._serialized_start=1866
_globals['_GETLEASEREQUEST']._serialized_end=1934
_globals['_LISTLEASESREQUEST']._serialized_start=1937
_globals['_LISTLEASESREQUEST']._serialized_end=2110
_globals['_LISTLEASESRESPONSE']._serialized_start=2112
_globals['_LISTLEASESRESPONSE']._serialized_end=2226
_globals['_CREATELEASEREQUEST']._serialized_start=2229
_globals['_CREATELEASEREQUEST']._serialized_end=2393
_globals['_UPDATELEASEREQUEST']._serialized_start=2396
_globals['_UPDATELEASEREQUEST']._serialized_end=2539
_globals['_DELETELEASEREQUEST']._serialized_start=2541
_globals['_DELETELEASEREQUEST']._serialized_end=2612
_globals['_CLIENTSERVICE']._serialized_start=2615
_globals['_CLIENTSERVICE']._serialized_end=3678
_globals['_EXPORTER']._serialized_end=627
_globals['_EXPORTER_LABELSENTRY']._serialized_start=473
_globals['_EXPORTER_LABELSENTRY']._serialized_end=530
_globals['_LEASE']._serialized_start=630
_globals['_LEASE']._serialized_end=1507
_globals['_GETEXPORTERREQUEST']._serialized_start=1509
_globals['_GETEXPORTERREQUEST']._serialized_end=1583
_globals['_LISTEXPORTERSREQUEST']._serialized_start=1586
_globals['_LISTEXPORTERSREQUEST']._serialized_end=1765
_globals['_LISTEXPORTERSRESPONSE']._serialized_start=1767
_globals['_LISTEXPORTERSRESPONSE']._serialized_end=1893
_globals['_GETLEASEREQUEST']._serialized_start=1895
_globals['_GETLEASEREQUEST']._serialized_end=1963
_globals['_LISTLEASESREQUEST']._serialized_start=1966
_globals['_LISTLEASESREQUEST']._serialized_end=2139
_globals['_LISTLEASESRESPONSE']._serialized_start=2141
_globals['_LISTLEASESRESPONSE']._serialized_end=2255
_globals['_CREATELEASEREQUEST']._serialized_start=2258
_globals['_CREATELEASEREQUEST']._serialized_end=2422
_globals['_UPDATELEASEREQUEST']._serialized_start=2425
_globals['_UPDATELEASEREQUEST']._serialized_end=2568
_globals['_DELETELEASEREQUEST']._serialized_start=2570
_globals['_DELETELEASEREQUEST']._serialized_end=2641
_globals['_CLIENTSERVICE']._serialized_start=2644
_globals['_CLIENTSERVICE']._serialized_end=3707
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Runtime version pin note

The generated code enforces Protobuf runtime 6.30.1. Ensure CI and packaging install a compatible google-protobuf runtime to prevent runtime version errors at import time.


🏁 Script executed:

#!/bin/bash
python - <<'PY'
import google.protobuf
print("protobuf version:", google.protobuf.__version__)
PY

Length of output: 35


Ensure Protobuf runtime ≥ 6.30.1 in CI and packaging

Our environment currently has google-protobuf 4.25.8 installed, but the generated code requires Protobuf runtime 6.30.1 or newer. Without updating, imports will fail at runtime.

• Update your CI configuration and any packaging scripts to install or pin google-protobuf>=6.30.1.
• Verify that all downstream projects consuming this package also have compatible Protobuf runtimes.
• Consider adding a runtime version check in your test suite to catch mismatches early (e.g., in setup.py or a CI pre-step).

🤖 Prompt for AI Agents
In
packages/jumpstarter-protocol/jumpstarter_protocol/jumpstarter/client/v1/client_pb2.py
around lines 118-143, the generated Protobuf code requires
google-protobuf>=6.30.1 but CI and packaging currently install 4.25.8; update CI
configs and any packaging/requirements files to pin or install
google-protobuf>=6.30.1 (e.g., pyproject/requirements/CI install step), ensure
downstream consumers are informed/updated to use a compatible Protobuf runtime,
and add a lightweight runtime version check in the test/CI pre-step or packaging
metadata (or setup.py) to fail early if the installed google-protobuf is older
than 6.30.1.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*_pb2.py are auto generated files from grpc

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mangelajo You're absolutely right - *_pb2.py files are auto-generated from the protobuf compiler. Thanks for the clarification!

The protobuf runtime version mismatch issue still exists, but the solution should focus on the generation tooling and runtime environment alignment rather than manual file changes. The discrepancy between the required runtime version (6.30.1) and what's currently installed (4.25.8) would need to be addressed in:

  • The protobuf compiler/generation environment
  • CI/packaging dependencies
  • Runtime environment setup

Since these files regenerate automatically, any manual edits would be overwritten anyway.


✏️ Learnings added
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter-protocol/jumpstarter_protocol/jumpstarter/client/v1/client_pb2.py:0-0
Timestamp: 2025-09-02T07:32:04.510Z
Learning: *_pb2.py files in the jumpstarter codebase are auto-generated from gRPC/protobuf definitions and should not be manually modified. Issues with these files should be addressed at the protobuf generation tooling level or runtime environment setup.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
packages/jumpstarter-cli/jumpstarter_cli/run.py (2)

79-87: Parent misinterprets child’s SIGHUP exit; treat it as restart

When the child returns 1 for SIGHUP, the parent maps it to 129. Adjust the mapping to restart on either 0 (graceful) or SIGHUP.

     if os.WIFEXITED(status):
         # Interpret child exit code
         child_exit_code = os.WEXITSTATUS(status)
-        if child_exit_code == 0:
+        if child_exit_code in (0, signal.SIGHUP):
             return None  # restart child (unexpected exit/exception)
         else:
             # Child indicates termination (signal number)
             return 128 + child_exit_code  # Return standard Unix exit code

Alternatively (or additionally), keep the current parent logic and adopt the child-side change suggested above to return 0 on SIGHUP.


59-61: Map SIGHUP to restart instead of error exit

Per PR goals, SIGHUP should restart the child. Returning the raw signal number causes the parent to treat it as an error (128 + 1). Return 0 for SIGHUP so the parent restarts.

-        # Return signal number if received, otherwise 0 for immediate restart
-        return received_signal if received_signal else 0
+        # Restart on SIGHUP or no-signal cancellations; otherwise propagate signal number
+        return 0 if received_signal in (0, signal.SIGHUP) else received_signal
🧹 Nitpick comments (3)
packages/jumpstarter/jumpstarter/exporter/session.py (1)

46-55: Make teardown logging resilient if report() also fails

If self.root_device.report() raises while handling a close() error, the exception escapes the except block and can mask the original teardown failure. Fall back gracefully if report() fails so we always log and proceed to handler removal.

 def __exit__(self, exc_type, exc_value, traceback):
-        try:
-            self.root_device.close()
-        except Exception as e:
-            # Get driver name from report for more descriptive logging
-            report = self.root_device.report()
-            driver_name = report.labels.get('jumpstarter.dev/name', self.root_device.__class__.__name__)
-            logger.error("Error closing driver %s: %s", driver_name, e, exc_info=True)
+        try:
+            self.root_device.close()
+        except Exception as e:
+            # Try to get a descriptive name, but don't let that fail the teardown path
+            try:
+                report = self.root_device.report()
+                driver_name = report.labels.get('jumpstarter.dev/name', self.root_device.__class__.__name__)
+            except Exception:
+                driver_name = self.root_device.__class__.__name__
+            logger.error("Error closing driver %s: %s", driver_name, e, exc_info=True)
         finally:
             logging.getLogger().removeHandler(self._logging_handler)
packages/jumpstarter/jumpstarter/config/exporter.py (1)

166-175: Nit: annotate the contextmanager yield type for IDEs and readers

Returning the concrete type helps usage sites and avoids guesswork.

-    async def create_exporter(self):
+    async def create_exporter(self) -> "asynccontextmanager[Exporter]":

Note: if forward references are undesirable, you can import Exporter under TYPE_CHECKING to avoid runtime cycles.

packages/jumpstarter/jumpstarter/exporter/exporter.py (1)

51-76: Close channel after Unregister — good; also guard close against timeout cancellation

If move_on_after(10) cancels the scope, the await channel.close() in the inner finally can itself be cancelled. Shielding the close ensures resources are released even on timeout.

 with anyio.move_on_after(10):  # 10 second timeout
-    channel = await self.channel_factory()
-    try:
+    channel = await self.channel_factory()
+    try:
         controller = jumpstarter_pb2_grpc.ControllerServiceStub(channel)
         await controller.Unregister(
             jumpstarter_pb2.UnregisterRequest(
                 reason="Exporter shutdown",
             )
         )
         logger.info("Controller unregistration completed successfully")
-    finally:
-        await channel.close()
+    finally:
+        # Ensure the channel closes even if the scope was cancelled by the timeout
+        with anyio.CancelScope(shield=True):
+            await channel.close()

Additionally, consider mirroring this “explicit channel + close” pattern in session(), handle.listen() and serve.status() to avoid accumulating open channels under long-running failures.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f144c17 and ffa54bc.

📒 Files selected for processing (4)
  • packages/jumpstarter-cli/jumpstarter_cli/run.py (1 hunks)
  • packages/jumpstarter/jumpstarter/config/exporter.py (1 hunks)
  • packages/jumpstarter/jumpstarter/exporter/exporter.py (4 hunks)
  • packages/jumpstarter/jumpstarter/exporter/session.py (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Learnt from: NickCao
PR: jumpstarter-dev/jumpstarter#521
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:1-2
Timestamp: 2025-06-12T17:18:48.519Z
Learning: Exporters in this project are intended to run only on Unix systems, so Unix-specific APIs like `os.fork()` are acceptable and Windows compatibility is not required.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#584
File: packages/jumpstarter/jumpstarter/client/grpc.py:38-54
Timestamp: 2025-08-18T07:13:37.619Z
Learning: User michalskrivanek prefers inline code over small helper function extractions when the abstraction doesn't significantly improve readability or reduce complexity. They value straightforward, readable code over reducing minor duplication.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
  • packages/jumpstarter/jumpstarter/exporter/exporter.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO 4.6.2+, `open_signal_receiver()` is a regular synchronous context manager that should be used with a plain `with` statement, not `async with`. However, it returns an asynchronous iterator that must be consumed with `async for`. The correct usage pattern is: `with open_signal_receiver(...) as signals: async for signum in signals:`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-05-07T08:29:38.418Z
Learnt from: mangelajo
PR: jumpstarter-dev/jumpstarter#448
File: packages/jumpstarter-cli-common/jumpstarter_cli_common/signal.py:10-11
Timestamp: 2025-05-07T08:29:38.418Z
Learning: In AnyIO, `open_signal_receiver()` returns a synchronous context manager (used with regular `with`), not an asynchronous one. While the context manager is synchronous, it yields an asynchronous iterator that should be consumed with `async for`. Reference: https://anyio.readthedocs.io/en/stable/signals.html

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
📚 Learning: 2025-08-14T13:20:51.953Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter-cli/jumpstarter_cli/run.py:15-15
Timestamp: 2025-08-14T13:20:51.953Z
Learning: Signal 0 is not a valid receivable signal in Unix systems. It can only be used as an argument to kill(pid, 0) to test if a process exists, but doesn't actually send a signal. Receivable signals are positive integers (e.g., SIGINT=2, SIGTERM=15, SIGHUP=1, SIGQUIT=3). Using 0 to represent "no signal received" in signal handling code is correct.

Applied to files:

  • packages/jumpstarter-cli/jumpstarter_cli/run.py
🧬 Code graph analysis (4)
packages/jumpstarter-cli/jumpstarter_cli/run.py (4)
packages/jumpstarter-cli-common/jumpstarter_cli_common/config.py (1)
  • opt_config (99-100)
packages/jumpstarter-cli-common/jumpstarter_cli_common/exceptions.py (2)
  • handle_exceptions (32-46)
  • leaf_exceptions (78-102)
packages/jumpstarter/jumpstarter/exporter/exporter.py (3)
  • stop (36-49)
  • serve (143-193)
  • status (152-171)
packages/jumpstarter/jumpstarter/config/exporter.py (2)
  • create_exporter (160-193)
  • serve (195-197)
packages/jumpstarter/jumpstarter/exporter/session.py (1)
packages/jumpstarter/jumpstarter/driver/base.py (2)
  • close (85-87)
  • report (192-209)
packages/jumpstarter/jumpstarter/config/exporter.py (4)
packages/jumpstarter/jumpstarter/exporter/exporter.py (2)
  • Exporter (25-193)
  • serve (143-193)
packages/jumpstarter/jumpstarter/common/grpc.py (2)
  • ssl_channel_credentials (16-46)
  • aio_secure_channel (49-54)
packages/jumpstarter/jumpstarter/config/grpc.py (1)
  • call_credentials (6-20)
packages/jumpstarter/jumpstarter/common/utils.py (1)
  • serve (31-39)
packages/jumpstarter/jumpstarter/exporter/exporter.py (3)
packages/jumpstarter/jumpstarter/config/exporter.py (1)
  • channel_factory (167-174)
packages/jumpstarter-protocol/jumpstarter_protocol/jumpstarter/v1/jumpstarter_pb2_grpc.py (1)
  • ControllerServiceStub (9-68)
packages/jumpstarter/conftest.py (1)
  • Unregister (59-60)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: e2e
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
🔇 Additional comments (6)
packages/jumpstarter/jumpstarter/config/exporter.py (2)

159-194: Lifecycle wrapper looks solid; shielded teardown and aenter/aexit gating LGTM

The explicit lifecycle via create_exporter() with entered gating and a shielded __aexit__ matches the PR goals and avoids abrupt cancellation during cleanup.


195-197: Delegation to the lifecycle is correct

serve() delegating to async with self.create_exporter() keeps responsibilities clean and centralizes teardown. Good change.

packages/jumpstarter/jumpstarter/exporter/exporter.py (3)

26-35: Type changes align with awaited channel factory and task-group tracking

  • channel_factory: Callable[[], Awaitable[grpc.aio.Channel]] correctly reflects the async channel creation.
  • Internal _tg, _started, _stop_requested state fits the new stop semantics.

36-50: Stop semantics are sensible; immediate vs. lease-exit paths

The split between immediate cancellation and “stop on lease exit” is clear and matches how the child should react to SIGHUP vs. TERM/INT. No issues spotted.


173-194: Serve loop + internal task group assignment LGTM

Storing self._tg, invoking self.stop() on lease changes, and clearing _tg on exit integrates well with the child’s signal-driven stop model.

packages/jumpstarter-cli/jumpstarter_cli/run.py (1)

41-42: LGTM: start signal receiver before exporter creation

Starting the signal handler immediately reduces window for missed signals and aligns with AnyIO’s guidance on signal handling.

Comment thread packages/jumpstarter-cli/jumpstarter_cli/run.py
Comment thread packages/jumpstarter-cli/jumpstarter_cli/run.py
Properly handle termination signals to exit cleanly.
Signals are forwarded to child, and its exit code is propagated to parent.
SIGHUP restarts the child while keeping the parent.
sending SIGHUP to child process results in delayed termination waiting on lease to be free
(either the lease ends or there is another error resulting in child restart)

Intended for graceful termination of exporter after the current lease holder is done
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/jumpstarter/jumpstarter/exporter/exporter.py (2)

114-135: Leak: Listen stream creates channels but never closes them (retries can exhaust FDs)

Each retry re-creates a channel. Ensure it’s closed after the stream ends (successfully, error, or cancellation).

         async def listen(retries=5, backoff=3):
             retries_left = retries
             while True:
                 try:
-                    controller = jumpstarter_pb2_grpc.ControllerServiceStub(await self.channel_factory())
-                    async for request in controller.Listen(jumpstarter_pb2.ListenRequest(lease_name=lease_name)):
-                        await listen_tx.send(request)
+                    ch = await self.channel_factory()
+                    try:
+                        controller = jumpstarter_pb2_grpc.ControllerServiceStub(ch)
+                        async for request in controller.Listen(
+                            jumpstarter_pb2.ListenRequest(lease_name=lease_name)
+                        ):
+                            await listen_tx.send(request)
+                    finally:
+                        # Shielding optional; add if you expect cancellation here.
+                        await ch.close()
                 except Exception as e:
                     if retries_left > 0:
                         retries_left -= 1
                         logger.info(
                             "Listen stream interrupted, restarting in {}s, {} retries left: {}".format(
                                 backoff, retries_left, e
                             )
                         )
                         await sleep(backoff)
                     else:
                         raise
                 else:
                     retries_left = retries

153-173: Leak: Status stream also doesn’t close channels on restart

Same problem as Listen; close the channel per attempt.

         async def status(retries=5, backoff=3):
             retries_left = retries
             while True:
                 try:
-                    controller = jumpstarter_pb2_grpc.ControllerServiceStub(await self.channel_factory())
-                    async for status in controller.Status(jumpstarter_pb2.StatusRequest()):
-                        await status_tx.send(status)
+                    ch = await self.channel_factory()
+                    try:
+                        controller = jumpstarter_pb2_grpc.ControllerServiceStub(ch)
+                        async for status in controller.Status(jumpstarter_pb2.StatusRequest()):
+                            await status_tx.send(status)
+                    finally:
+                        await ch.close()
                 except Exception as e:
                     if retries_left > 0:
                         retries_left -= 1
                         logger.info(
                             "Status stream interrupted, restarting in {}s, {} retries left: {}".format(
                                 backoff, retries_left, e
                             )
                         )
                         await sleep(backoff)
                     else:
                         raise
                 else:
                     retries_left = retries
🧹 Nitpick comments (6)
packages/jumpstarter/jumpstarter/exporter/exporter.py (6)

36-50: stop(): annotate return type; otherwise looks sound

Minor polish: add explicit return type.

-    def stop(self, wait_for_lease_exit=False):
+    def stop(self, wait_for_lease_exit: bool = False) -> None:

52-76: Don’t log expected cancellations/timeouts as errors during cleanup

Unregistration may be cancelled/timed-out; log at info for CancelledError to keep logs clean. You already shield channel.close(), which is great.

         try:
             if self.registered:
                 logger.info("Unregistering exporter with controller")
                 try:
                     with anyio.move_on_after(10):  # 10 second timeout
                         channel = await self.channel_factory()
                         try:
                             controller = jumpstarter_pb2_grpc.ControllerServiceStub(channel)
                             await controller.Unregister(
                                 jumpstarter_pb2.UnregisterRequest(
                                     reason="Exporter shutdown",
                                 )
                             )
                             logger.info("Controller unregistration completed successfully")
                         finally:
                             with anyio.CancelScope(shield=True):
                                 await channel.close()
-                except Exception as e:
-                    logger.error("Error during controller unregistration: %s", e, exc_info=True)
+                except anyio.get_cancelled_exc_class():
+                    logger.info("Controller unregistration cancelled or timed out")
+                except Exception as e:
+                    logger.error("Error during controller unregistration: %s", e, exc_info=True)
 
-        except Exception as e:
-            logger.error("Error during exporter cleanup: %s", e, exc_info=True)
+        except anyio.get_cancelled_exc_class():
+            logger.info("Exporter cleanup cancelled")
+        except Exception as e:
+            logger.error("Error during exporter cleanup: %s", e, exc_info=True)

86-106: Close gRPC channel used for initial Register to avoid a leak

This channel is not reused; close it after Register.

-        controller = jumpstarter_pb2_grpc.ControllerServiceStub(await self.channel_factory())
+        ch = await self.channel_factory()
+        try:
+            controller = jumpstarter_pb2_grpc.ControllerServiceStub(ch)
         with Session(
             uuid=self.uuid,
             labels=self.labels,
             root_device=self.device_factory(),
         ) as session:
             async with session.serve_unix_async() as path:
                 async with grpc.aio.secure_channel(
                     f"unix://{path}", grpc.local_channel_credentials(grpc.LocalConnectionType.UDS)
                 ) as channel:
                     response = await jumpstarter_pb2_grpc.ExporterServiceStub(channel).GetReport(empty_pb2.Empty())
                     logger.info("Registering exporter with controller")
-                    await controller.Register(
+                    await controller.Register(
                         jumpstarter_pb2.RegisterRequest(
                             labels=self.labels,
                             reports=response.reports,
                         )
                     )
                     self.registered = True
                 yield path
+        finally:
+            await ch.close()

145-147: Serve docstring: clarify exit conditions and parent/child contract

Document that serve() returns on lease change or stop request; in the fork-based model the parent restarts the child.

-        """
-        Serve the exporter.
-        """
+        """
+        Serve the exporter.
+
+        Returns when:
+        - The lease name changes (teardown and let the supervisor restart us), or
+        - stop(wait_for_lease_exit=False) cancels the task group, or
+        - stop(wait_for_lease_exit=True) is requested and the exporter is not leased.
+
+        In the fork-based architecture, the parent supervises and restarts the child on exit.
+        """

178-182: Improve lease-change logging; set both old/new for traceability

Optional: log transition and then stop. Functional behavior unchanged.

-                if self.lease_name != "" and self.lease_name != status.lease_name:
-                    self.lease_name = status.lease_name
-                    logger.info("Lease status changed, killing existing connections")
+                if self.lease_name != "" and self.lease_name != status.lease_name:
+                    old = self.lease_name
+                    self.lease_name = status.lease_name
+                    logger.info("Lease changed: %s -> %s; tearing down", old, self.lease_name)
                     self.stop()
                     break

191-194: Redundant stop() call before breaking when not leased

You’re already inside the same TaskGroup; just break and let the outer context exit. The extra cancel is harmless but noisy.

-                    if self._stop_requested:
-                        self.stop()
-                        break
+                    if self._stop_requested:
+                        break
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ffa54bc and f4754db.

📒 Files selected for processing (4)
  • packages/jumpstarter-cli/jumpstarter_cli/run.py (1 hunks)
  • packages/jumpstarter/jumpstarter/config/exporter.py (1 hunks)
  • packages/jumpstarter/jumpstarter/exporter/exporter.py (4 hunks)
  • packages/jumpstarter/jumpstarter/exporter/session.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/jumpstarter/jumpstarter/config/exporter.py
  • packages/jumpstarter-cli/jumpstarter_cli/run.py
  • packages/jumpstarter/jumpstarter/exporter/session.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#584
File: packages/jumpstarter/jumpstarter/client/grpc.py:38-54
Timestamp: 2025-08-18T07:13:37.619Z
Learning: User michalskrivanek prefers inline code over small helper function extractions when the abstraction doesn't significantly improve readability or reduce complexity. They value straightforward, readable code over reducing minor duplication.
📚 Learning: 2025-08-14T13:11:35.034Z
Learnt from: michalskrivanek
PR: jumpstarter-dev/jumpstarter#589
File: packages/jumpstarter/jumpstarter/exporter/exporter.py:147-155
Timestamp: 2025-08-14T13:11:35.034Z
Learning: In the jumpstarter fork-based architecture, when the exporter's serve() method exits (e.g., due to lease changes), the child process terminates and the parent process automatically restarts it, eliminating concerns about orphaned tasks since the entire process is restarted.

Applied to files:

  • packages/jumpstarter/jumpstarter/exporter/exporter.py
🧬 Code graph analysis (1)
packages/jumpstarter/jumpstarter/exporter/exporter.py (3)
packages/jumpstarter/jumpstarter/config/exporter.py (1)
  • channel_factory (167-174)
packages/jumpstarter-protocol/jumpstarter_protocol/jumpstarter/v1/jumpstarter_pb2_grpc.py (1)
  • ControllerServiceStub (9-68)
packages/jumpstarter/conftest.py (1)
  • Unregister (59-60)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Redirect rules - jumpstarter-docs
  • GitHub Check: Header rules - jumpstarter-docs
  • GitHub Check: Pages changed - jumpstarter-docs
  • GitHub Check: pytest-matrix (macos-15, 3.13)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.12)
  • GitHub Check: pytest-matrix (macos-15, 3.12)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.11)
  • GitHub Check: pytest-matrix (macos-15, 3.11)
  • GitHub Check: pytest-matrix (ubuntu-24.04, 3.13)
  • GitHub Check: e2e
🔇 Additional comments (6)
packages/jumpstarter/jumpstarter/exporter/exporter.py (6)

8-8: Using anyio.abc.TaskGroup is fine

Interface import keeps typing precise while still compatible with create_task_group() instances.


26-34: Lifecycle fields (_stop_requested, _started, _tg) look good

These enable explicit shutdown semantics; no issues spotted.


175-175: Storing tg for later cancellation is appropriate

This enables external stop() to cancel the running group.


184-186: Deferred start until lease known — LGTM

Starting handle after first non-empty lease name avoids premature listeners.


194-194: Resetting _tg to None on exit — LGTM

Prevents stale references after the TaskGroup context closes.


2-2: Async channel_factory usage verified – no missing await calls

I ran the suggested grep across the exporter and config modules to find any un-awaited channel_factory() invocations and confirmed there are none. The switch to Awaitable and the import from collections.abc is correct, and all call sites properly await the async factory. Code changes look good as-is.

@mangelajo mangelajo self-requested a review September 2, 2025 07:32
@mangelajo mangelajo merged commit efb9d58 into jumpstarter-dev:main Sep 2, 2025
18 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants