Skip to content

fix(node): handle SIGTERM for graceful shutdown on operator stop#3410

Open
barakeinav1 wants to merge 1 commit into
mainfrom
barak/3409-mpc-node-sigterm-handler
Open

fix(node): handle SIGTERM for graceful shutdown on operator stop#3410
barakeinav1 wants to merge 1 commit into
mainfrom
barak/3409-mpc-node-sigterm-handler

Conversation

@barakeinav1
Copy link
Copy Markdown
Contributor

Summary

  • Installs a SIGTERM handler in mpc-node that routes the signal into the existing internal shutdown channel — operators stopping the node via dstack CVM stop / docker stop / kubectl delete / systemctl stop now get graceful shutdown semantics, where before SIGTERM was effectively SIGKILL.
  • After the main select! exits, calls near_async::shutdown_all_actors() so nearcore's actor system can commit any in-flight RocksDB batches before the process exits.

Closes #3409. Does NOT close the upstream nearcore restart panic at streamer/mod.rs:207 — see docs/investigation/nearcore-indexer-sigkill-restart-panic.md. Test campaign data on the investigation branch showed this handler gets back-migration e2e from 0/5 pass to 1/5 — a real production improvement, but not a full fix for the upstream non-determinism.

Notable non-action: no RocksDB::block_until_all_instances_are_dropped()

neard's standalone binary follows shutdown_all_actors() with RocksDB::block_until_all_instances_are_dropped(). We deliberately don't, because in our embedding it hangs indefinitely: the indexer runs in a std::thread::spawn'd closure whose block_on never returns — the spawned monitor_*/indexer_logger tasks hold Arc<IndexerState>Arc<RocksDB> references that nothing currently cancels on shutdown. Calling block_until_all_instances_are_dropped would block the SIGTERM path past any reasonable grace period and the orchestrator would SIGKILL us anyway. A proper fix would wire a CancellationToken through the indexer thread; out of scope for this PR. The inline comment captures the rationale.

Test plan

  • CI green (cargo check / clippy / fmt on the new code).
  • Manual: run mpc-node locally against localnet, send kill -TERM <pid>, observe SIGTERM received, initiating graceful shutdown log followed by Stopping nearcore actor system. and a clean exit (typically <1s).
  • No regression: existing e2e tests pass (the should_handle_back_migration_a_to_b_to_a flake is pre-existing and tracks the upstream bug, not this handler).

mpc-node had no SIGTERM handler installed, so SIGTERM from
container/orchestrator stop (dstack CVM stop, docker stop, kubectl
delete, systemctl stop) was effectively SIGKILL — the OS terminated
the process immediately, leaving the embedded near-indexer thread to
be killed mid-write. The next start could then trip the nearcore
restart panic documented in docs/investigation/.

This installs a SIGTERM handler that routes into the existing
shutdown_signal channel, so the same select! arm that handles TEE
image-hash-initiated shutdowns also handles SIGTERM. After the main
loop exits we call near_async::shutdown_all_actors() so nearcore's
actor system can commit any in-flight RocksDB batches before we
return.

We deliberately do NOT call RocksDB::block_until_all_instances_are_dropped()
(which neard's standalone binary does next) — our embedded indexer
runs in a std::thread::spawn'd closure whose block_on never returns
because spawned monitor tasks hold Arc<IndexerState> → Arc<RocksDB>
refs that nothing cancels. Calling block_until_all_instances_are_dropped
would hang the SIGTERM path past any reasonable grace period and end
in the SIGKILL we were trying to avoid. RocksDB's WAL still
guarantees committed data survives kill; this just closes a smaller
flush window. See the inline comment.

Closes #3409. Does NOT close the upstream nearcore restart panic at
streamer/mod.rs:207 which fires non-deterministically regardless of
shutdown cleanliness — see docs/investigation/nearcore-indexer-sigkill-restart-panic.md.
Copilot AI review requested due to automatic review settings May 31, 2026 14:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Installs a Unix SIGTERM handler in mpc-node that forwards SIGTERM into the existing internal shutdown channel so operator-initiated stops (Docker/Kubernetes/systemd/dstack) get the same graceful path used by TEE image-hash-driven shutdowns, and calls near_async::shutdown_all_actors() after the main select! exits to give nearcore's actor system a chance to flush in-flight RocksDB batches.

Changes:

  • Spawn a task on root_runtime that installs a SIGTERM handler and sends () on shutdown_signal_sender when received.
  • After graceful shutdown, invoke near_async::shutdown_all_actors() (with an inline rationale for skipping RocksDB::block_until_all_instances_are_dropped()).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@claude
Copy link
Copy Markdown

claude Bot commented May 31, 2026

Pull request overview

Adds a SIGTERM handler in mpc-node that routes the signal into the existing internal shutdown channel, so operator-initiated stops (dstack CVM stop, docker stop, kubectl delete, systemctl stop) get the same graceful path as TEE image-hash-initiated shutdowns. After the main select! exits, the function now also calls near_async::shutdown_all_actors() so nearcore's actor system can commit in-flight RocksDB batches. The deliberate omission of RocksDB::block_until_all_instances_are_dropped() is documented inline with the underlying reason (embedded indexer thread's block_on never returns, so it would hang past any reasonable grace period).

Changes:

  • Install a SIGTERM listener via tokio::signal::unix::signal(SignalKind::terminate()) and forward to shutdown_signal_sender (size-1 mpsc shared with the image-hash watcher).
  • Append near_async::shutdown_all_actors() after the image hash watcher join, before returning the exit reason.

Reviewed changes

Per-file summary
File Description
crates/node/src/run.rs Spawns a SIGTERM-listener task into root_runtime that sends () into the existing shutdown channel; calls near_async::shutdown_all_actors() after image-hash-watcher shutdown.

Findings

Blocking (must fix before merge):

  • None.

Non-blocking (nits, follow-ups, suggestions):

  • crates/node/src/run.rs:195 — The inline comment points readers to docs/investigation/nearcore-indexer-sigkill-restart-panic.md, and the PR description links to that same file on main. Neither this PR nor main actually contains it (find docs -name 'nearcore-indexer-sigkill*' is empty, no commit touches docs/investigation/). Either add the doc in this PR or drop the reference so future readers don't chase a broken link.
  • crates/node/src/run.rs:184-210 — No test exercises the new path; engineering-standards.md asks that every change have a test that would fail under revert. Signal delivery is awkward to unit-test, but the routing logic ("SIGTERM source -> shutdown_signal_sender -> select! exits with the watcher-shutdown error") could be covered by extracting the listener into a small helper that takes a Stream of signal events and a Sender, then driving it from a test. Worth at least an e2e smoke check that kill -TERM produces a clean exit with the new log lines (the PR's manual test plan is a fine substitute for one round, but won't catch future regressions).
  • crates/node/src/run.rs:199 — Only SIGTERM is wired up. SIGINT (Ctrl-C) and SIGHUP still default-terminate. Probably fine for the immediate motivation (container/operator stop), but worth a follow-up since the cost of adding them here is one extra signal() call and a tokio::select!.
  • crates/node/src/run.rs:203let _ = shutdown_for_sigterm.send(()).await; is the right call (after the main select! triggers, the receiver drops and any subsequent sender errors are expected), but a one-line note explaining that would save the next reader a moment.
  • crates/node/src/run.rs:198 — Spawning the handler on root_runtime after the indexer thread is already spawned introduces a small race window where SIGTERM arriving early in run_mpc_node still default-terminates the process. Moving the handler install to the top of the function (before spawn_real_indexer) closes that window. Minor — startup is fast — but trivially fixed.

✅ Approved

barakeinav1 added a commit that referenced this pull request May 31, 2026
…tdown question

The earlier docs had to hedge: SIGTERM experiments couldn't distinguish
'graceful shutdown doesn't help' from 'we never had a SIGTERM handler.'
We've now closed that gap by landing a real SIGTERM handler (#3409 /
#3410) and re-running the back-migration campaign.

- 2121-back-migration-e2e-flake.md: new section 'Real SIGTERM handler in
  mpc-node — also does not fix it' covering iteration 1 (handler hangs
  on block_until_all_instances_are_dropped, 5/5 SIGKILL fallback) and
  iteration 2 (drop the block_until call, 1/5 pass with 100 ms graceful
  shutdowns in 5/5). Follow-up #5 updated — in flight via #3409 / #3410.
- nearcore-indexer-sigkill-restart-panic.md: new 'Strongest evidence:
  graceful shutdown doesn't help' subsection inside 'What we ruled out';
  table updated with the 5-run SIGTERM-handler row (4/5 fail at 100 ms);
  TL;DR + Workarounds + References updated; old SIGTERM-disclaimer
  removed since the question is now answered.

Together the docs now say: same panic, same rate, regardless of whether
shutdown was SIGKILL or a verified 100 ms graceful actor-system stop.
The fix has to be in nearcore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mpc-node: handle SIGTERM for graceful shutdown on operator stop

2 participants