Harden stream process restarts: exponential backoff + ALSA readiness gate by stamateviorel · Pull Request #1097 · micro-nova/AmpliPi

stamateviorel · 2026-06-10T12:35:28Z

What does this change intend to accomplish?

Two related robustness fixes for streams/process_monitor.py:

Exponential restart backoff. The monitor restarted a crashing child in a tight zero-delay loop — a process that dies instantly (e.g. its ALSA device can't be opened) gets respawned as fast as the loop runs, flooding logs and wearing the SD card. Fast failures now back off 2s→30s, resetting once a run survives 10 seconds.
ALSA readiness gate. When the monitored player exits because its loopback is still held (a not-yet-released previous instance, or the dmix state from Spotify controlled loopback getting into a bad state #957), every respawn dies instantly with EINVAL and the stream stays silent — we measured a 5.5-hour outage from one stream re-assign. If the monitored command plays to an ALSA loopback (-o lb*), the monitor now probes the device with a 1s silent aplay (the same open path the player uses) before each spawn and waits quietly until it opens. On the first failed probe it logs the current /dev/snd holders via fuser so the journal explains the wedge; after four failed probes it kills a stale leftover player that targets the same device (strict match on binary name + exact -o argument, never its own child). Commands without -o lb* (e.g. alsaloop) are completely unaffected.

Both are running in production on a real AmpliPi; the gate has already fired twice (loopback briefly held at service start) and recovered in one 2-second retry instead of crash-looping.

Checklist

Have you tested your changes and ensured they work? (in production; gate exercised live by holding the loopback with aplay through a service restart)
Have you checked to ensure there aren't other open Pull Requests for the same update/change?
If applicable, have you updated the CHANGELOG?
Does your submission pass linting & tests? (python -m py_compile clean; happy to fix anything CI flags)

process_monitor restarted a crashing child in a tight zero-delay loop: a process that dies instantly (e.g. its ALSA output device cannot be opened) gets respawned as fast as the loop runs, flooding logs and wearing the SD card. Track consecutive fast failures and back off exponentially (2s..30s), resetting once a run survives 10 seconds. Signed-off-by: Stamate Viorel <stamate.viorel@gmail.com> Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

When the monitored player exits because its loopback is still held (a not-yet-released previous instance, or the dmix state from micro-nova#957), every respawn dies instantly with EINVAL and the stream stays silent - we saw a 5.5h outage from one stream re-assign. If the monitored command plays to an ALSA loopback (-o lb*), probe the device with a 1s silent aplay (the same open path the player uses) before each spawn and wait quietly until it opens. On the first failed probe, log the current /dev/snd holders via fuser so the journal explains the wedge; after four failed probes, kill a stale leftover player process that targets the same device (strict match on the binary name and exact -o argument, never the monitor's own child). Processes without an -o lb* argument are completely unaffected. Signed-off-by: Stamate Viorel <stamate.viorel@gmail.com> Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

stamateviorel and others added 2 commits June 10, 2026 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Harden stream process restarts: exponential backoff + ALSA readiness gate#1097

Harden stream process restarts: exponential backoff + ALSA readiness gate#1097
stamateviorel wants to merge 2 commits into
micro-nova:mainfrom
stamateviorel:fix/process-monitor-restart-hardening

stamateviorel commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

stamateviorel commented Jun 10, 2026

What does this change intend to accomplish?

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant