Spotify-Control & mplayer Robustness — live-reload, decode-then-escape, auto-respawn by wowa1990 · Pull Request #147 · splitti/MuPiBox

wowa1990 · 2026-05-16T16:58:39Z

Was es macht

Live-Reload von mupiboxconfig.json via fs.watch (Config-Änderungen brauchen kein Backend-Restart mehr)
Watcher-Crash-Härtung (Recover statt Backend-Tot)
mplayer-Wrapper: decode-then-escape-Reihenfolge (vorher hob Escape den Decode auf, Folge waren stille Locals mit Sonderzeichen)
mplayer-Auto-Respawn statt Dauer-Fehler bei Crash
handleSpotifyError fängt jetzt auch Non-API-Errors (Network, parse, etc.)
setVolume serialisiert + liest amixer-State direkt statt aus Cache (verhindert Volume-Drift)
SDK-Polling-Intervall reduziert (B11/M1/M2-Pattern, kein eager subscribe auf mediaService.current$ mehr — sonst bricht Spotify Connect)
AR5-5 I2C-Mutex und M6 SMBus-Reconnect für die mupihat-Anbindung
Log-Timestamp-Fix (AR5-4) und startup.wav Doppelton-Fix (chromium-autostart)

Architektur (kurz)

Watcher hängt am Backend-Process — kein extra-Daemon. mplayer-Respawn via Exit-Code-Detection im Wrapper. I2C-Mutex ist ein threading.Lock um die smbus-Calls, damit Read+Write nicht ineinander laufen, wenn HAT-Polling und Display-Update gleichzeitig kommen.

Test in Codespaces / lokal

Branch auschecken, Backend wie üblich starten
Ändere in laufendem Backend mupiboxconfig.json (z.B. Volume-Default) → erwartet log-Eintrag „config reloaded", neue Werte aktiv ohne Restart
Mock-Spotify-Error mit invalidem Token → erwartet kein Backend-Crash, klare log-Message

Hinweis: mplayer-Auto-Respawn, I2C-Mutex, SMBus-Reconnect und Doppelton-Fix sind nur auf realer Box testbar (mplayer-Binary, mupihat-Hardware, pm2-Setup).

Offene Punkte

Watch-Debounce auf 200ms — falls dir das zu kurz ist (manche fs-watch-Implementations feuern bei einem mv zweimal), gerne hochsetzen.

Two related-but-distinct issues in the slave-protocol shim that sits between backend-player and mplayer. 1. Slave-protocol injection (HIGH-7). exec() built a command line and ran it through jsStringEscape() to escape `"`, newlines, and backslashes so a caller-controlled value (e.g. a track URL) could not break out of the quoted argument. Then the very next line did `str = decodeURIComponent(str)`, which UNDOES any `%22`, `%0A` or `%5C` the caller wedged in there. Net effect: a crafted URL like `http://attacker/file%22%0Astop%0A` decoded back to a literal `"` and newline AFTER escaping, injecting `stop` into the slave socket. Fix: keep decodeURIComponent (legitimate callers pass %xx-encoded paths from m3u resolvers, RSS feeds and the spotify-control playList helper — without decoding mplayer can't open them) but move it BEFORE jsStringEscape. Decoding-then-escaping preserves both halves: the caller still gets `Foo Bar/01.mp3` from `Foo%20Bar/ 01.mp3`, and a malicious `Foo%22%0Astop%0A` decodes to `Foo"\nstop\n` which jsStringEscape then neutralises into `Foo\"\\nstop\\n` — mplayer sees the escaped form, no slave-protocol injection. Wrap the decode in a try/catch since decodeURIComponent throws URIError on malformed `%FF` sequences. 2. No process recovery (MED-4). spawn() ran exactly once at module load, handled only the 'close' event, and ignored 'error' and stdin 'error'. So: - if mplayer crashed (SIGSEGV / OOM / decoder bug), every subsequent exec() wrote to a closed stdin and crashed the entire backend-player with EPIPE - if mplayer's binary went missing or fork() failed, the spawn error went uncaught and tore down the process Wrap the spawn in spawnMplayer() and attach error / stdin-error listeners. On unexpected close, schedule a respawn with capped exponential backoff (1, 2, 4, 8, 16, 30, 30…s). Reset the counter once a fresh process has been alive for 30s. The 'close' event still fires on intentional close() so existing teardown semantics are preserved (new `shutdown` flag distinguishes intentional from crash). Commands issued during the crash-respawn window are debug-logged and dropped instead of throwing — listeners can react to the new 'mplayer-crash' event.

…stale meta (MED-1) The volume-up handler had a TOCTOU race that let fast touchscreen taps push playback past muPiBoxConfig.mupibox.maxVolume — the Hörschutz cap parents rely on. The bug shape: exec(cmdVolume, callback) // async amixer-read, NOT awaited if (currentMeta.volume < muPiBoxConfig.mupibox.maxVolume) { cmdCall(volumeUp) currentMeta.volume += 5 } The exec was fired but the callback (which updates currentMeta.volume) ran later, while the if-check immediately used currentMeta.volume from whatever value the LAST setVolume left it at. With rapid taps: tap 1: cmd reads start, sync check sees stale volume X < cap → +5, currentMeta.volume = X+5 (locally, optimistic) tap 2 (before tap-1's read returns): same path, sees X+5 < cap → +5, currentMeta.volume = X+10 tap 3: … The actual amixer state climbs, but the local-optimistic view stays under the cap until reads finally catch up. Net: cap exceeded. Fix: 1. Promisify the amixer-read so each setVolume call awaits the ACTUAL hardware volume before deciding. 2. Add a serial queue (`_volumeOpQueue`) so concurrent setVolume invocations chain rather than overlap. Each chained op reads, checks, writes atomically with respect to other queued ops. 3. Bound the local update with Math.min/Math.max so even if amixer misreports, currentMeta stays in [0, maxVolume]. 4. Catch failures inside the chained .then so one bad op (e.g. amixer read failed during boot) doesn't poison the queue.

err.body was dereferenced unconditionally even though .error?.status had optional chaining; for any non-API error (DNS failure, connection drop, JSON parse error in the SDK, token-refresh HTTP failure) the handler itself crashed with TypeError, taking the whole backend-player process down via pm2 unhandledException. Add optional chaining on err and body.

chromium-autostart.sh fires from two paths that race: 1) restart_kiosk.sh after the admin "Restart services" click 2) dietpi-login auto-respawn on tty2 once the previous chromium browser process dies Without a guard each invocation runs its own `mplayer ${START_SOUND} &` in the background. Live monitoring during a single user click showed 2 chromium-autostart.sh instances and 3 mplayer-startup.wav instances running concurrently — the user reported the boot sound layered on top of itself. pkill any prior mplayer that's still playing the same wav before launching a fresh one. The `${START_SOUND}` path (the wav) is unique enough that the regex never matches the backend-player's slave-mode mplayer (which is invoked without a media-file path on its argv).

The BQ25792 driver previously raised I2CError on any OSError from the underlying smbus2 call. On a Pi the typical OSError is errno 121 "Remote I/O error" which occurs when an I2C transaction gets interrupted mid-byte (transient bus glitch, interleaved transaction). After such an error the bus is in an indeterminate state — subsequent reads also fail until the SMBus handle is re-opened. Without the auto-reconnect a single Remote-I/O-error knocks the mupihat daemon into a stale state: the periodic_json_dump cycle keeps logging errors, /tmp/mupihat.json goes stale until the watchdog eventually resets the service. Add a new _reopen_bus() helper that closes-and-reopens the smbus2.SMBus handle. In safe_execute(), catch OSError specifically (not the broader Exception catch), call _reopen_bus(), and retry the operation exactly once. If the retry also fails the original I2CError escalation path runs as before. The retry rebinds the bound method to the new bus handle so the original `self.bq.foo` reference doesn't dangle on the closed handle. Bound-method detection via hasattr(func, '__self__'). Other Exceptions (non-OSError) keep the original behaviour: log and re-raise as I2CError.

AR5-5: mupihat.py runs the BQ25792 driver in three threads — the periodic_json_dump daemon plus two Flask request workers (`/` and `/api/registers`). Without serialisation they can interleave I2C transactions on the same smbus2.SMBus handle, producing the "[Errno 121] Remote I/O error" bursts visible in the mupi_hat journal (observed live during charging on 2026-05-15). The Pi's i2c-bcm2835 driver doesn't fully recover from a half-completed sequence, so even benign collisions leave the bus in a state where read_all_register returns garbage or fails outright. Fix: module-level threading.Lock in mupihat.py wrapping every hat.* call. periodic_json_dump now acquires the lock in three short phases around the watchdog reset, the bulk register read, and the to_json / log_register_values follow-up reads — sleeps stay outside the lock so request workers can fit between phases (no pathological 5-second waits). The JSON file write also moved outside the lock: it doesn't touch I2C and can take 5-30 ms on SD. M6: companion auto-reconnect in mupihat_bq25792.py safe_execute(). When a single I2C transaction throws OSError (errno 121 is the common case), the wrapper now closes the SMBus handle, re-opens it with smbus2.SMBus(self.i2c_device), and retries the operation once before escalating to I2CError. This clears the hung bus state that previously persisted until the next cycle (or until mupihat-daemon restart in worst-case watchdog scenarios). The retry rebinds the bound method to the new bus handle so the original `self.bq.foo` reference doesn't dangle on the closed handle. Behaviour preserved: timing identical (sleep(0.1) + sleep(1) + sleep(3.9) per cycle), all original error logging kept, no API surface change. Only the locking and the OSError-recovery path are new.

wowa1990 added 6 commits May 16, 2026 16:03

wowa1990 mentioned this pull request Jun 6, 2026

Performance & Caching — pagination, cover-cache, in-mem-LRU, async writes #148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spotify-Control & mplayer Robustness — live-reload, decode-then-escape, auto-respawn#147

Spotify-Control & mplayer Robustness — live-reload, decode-then-escape, auto-respawn#147
wowa1990 wants to merge 6 commits into
splitti:mainfrom
wowa1990:pr-spotify-mplayer-robustness

wowa1990 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wowa1990 commented May 16, 2026

Was es macht

Architektur (kurz)

Test in Codespaces / lokal

Offene Punkte

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant