fix(ipc): Add timeout to AwaitMessage to prevent indefinite blocking#401
fix(ipc): Add timeout to AwaitMessage to prevent indefinite blocking#401Aman-Cool wants to merge 1 commit intourunc-dev:mainfrom
Conversation
✅ Deploy Preview for urunc canceled.
|
- Add IPCAcceptTimeout (60s) and IPCReadTimeout (10s) to prevent orphaned processes when counterpart never connects - Fix closure bug in executeHooksConcurrently using wrong loop variable - Fix isRunning() using annotType instead of annotHypervisor - Add tests for timeout and wrong message handling Signed-off-by: Aman-Cool <aman017102007@gmail.com>
d84f485 to
247d9be
Compare
|
This adds reasonable IPC timeouts so urunc doesn’t hang indefinitely during create/start, making failures safer and easier to recover from. |
|
Hello @Aman-Cool , thank you for this contribution. Please create an issue before opening a PR. Have you encountered such an issue you describe? Are there any steps to reproduce it? The waiting of the reexec process is a container runtime design choice. I am not negative to adding a timeout but I think we need to search a bit more on how other container runtimes handle such cases and what would be a reasonable timeout. |
|
Thanks @cmainas for the feedback. |
Prevent IPC hangs during container startup
This PR fixes a long-standing reliability issue in urunc’s IPC handshake between
createandstart.Previously, the IPC helper
AwaitMessage()would block indefinitely while waiting for a Unix socket connection and message. If the peer process never connected — for example becausecontainerdrestarted, theurunc startprocess was OOM-killed, or the node was under heavy load — the waiting process would never exit. This resulted in orphanedurunc --reexecprocesses, containers stuck inContainerCreating, and gradual resource leaks on the node, with no clear error reported.The fix adds a bounded timeout to the IPC accept and read steps. When the expected message is not received in time, the process now exits with a clear error instead of hanging forever. This makes failed container startups deterministic and observable, while leaving the normal, successful startup path unchanged.
In short: container creation now either succeeds, fails, or times out — but it no longer gets stuck silently.