Skip to content

Spotify-Control & mplayer Robustness — live-reload, decode-then-escape, auto-respawn#147

Open
wowa1990 wants to merge 6 commits into
splitti:mainfrom
wowa1990:pr-spotify-mplayer-robustness
Open

Spotify-Control & mplayer Robustness — live-reload, decode-then-escape, auto-respawn#147
wowa1990 wants to merge 6 commits into
splitti:mainfrom
wowa1990:pr-spotify-mplayer-robustness

Conversation

@wowa1990

Copy link
Copy Markdown

Was es macht

  • Live-Reload von mupiboxconfig.json via fs.watch (Config-Änderungen brauchen kein Backend-Restart mehr)
  • Watcher-Crash-Härtung (Recover statt Backend-Tot)
  • mplayer-Wrapper: decode-then-escape-Reihenfolge (vorher hob Escape den Decode auf, Folge waren stille Locals mit Sonderzeichen)
  • mplayer-Auto-Respawn statt Dauer-Fehler bei Crash
  • handleSpotifyError fängt jetzt auch Non-API-Errors (Network, parse, etc.)
  • setVolume serialisiert + liest amixer-State direkt statt aus Cache (verhindert Volume-Drift)
  • SDK-Polling-Intervall reduziert (B11/M1/M2-Pattern, kein eager subscribe auf mediaService.current$ mehr — sonst bricht Spotify Connect)
  • AR5-5 I2C-Mutex und M6 SMBus-Reconnect für die mupihat-Anbindung
  • Log-Timestamp-Fix (AR5-4) und startup.wav Doppelton-Fix (chromium-autostart)

Architektur (kurz)

Watcher hängt am Backend-Process — kein extra-Daemon. mplayer-Respawn via Exit-Code-Detection im Wrapper. I2C-Mutex ist ein threading.Lock um die smbus-Calls, damit Read+Write nicht ineinander laufen, wenn HAT-Polling und Display-Update gleichzeitig kommen.

Test in Codespaces / lokal

  1. Branch auschecken, Backend wie üblich starten
  2. Ändere in laufendem Backend mupiboxconfig.json (z.B. Volume-Default) → erwartet log-Eintrag „config reloaded", neue Werte aktiv ohne Restart
  3. Mock-Spotify-Error mit invalidem Token → erwartet kein Backend-Crash, klare log-Message

Hinweis: mplayer-Auto-Respawn, I2C-Mutex, SMBus-Reconnect und Doppelton-Fix sind nur auf realer Box testbar (mplayer-Binary, mupihat-Hardware, pm2-Setup).

Offene Punkte

  • Watch-Debounce auf 200ms — falls dir das zu kurz ist (manche fs-watch-Implementations feuern bei einem mv zweimal), gerne hochsetzen.

wowa1990 added 6 commits May 16, 2026 16:03
Two related-but-distinct issues in the slave-protocol shim that sits
between backend-player and mplayer.

1. Slave-protocol injection (HIGH-7). exec() built a command line and
   ran it through jsStringEscape() to escape `"`, newlines, and
   backslashes so a caller-controlled value (e.g. a track URL) could
   not break out of the quoted argument. Then the very next line did
   `str = decodeURIComponent(str)`, which UNDOES any `%22`, `%0A` or
   `%5C` the caller wedged in there. Net effect: a crafted URL like
   `http://attacker/file%22%0Astop%0A` decoded back to a literal `"`
   and newline AFTER escaping, injecting `stop` into the slave socket.

   Fix: keep decodeURIComponent (legitimate callers pass %xx-encoded
   paths from m3u resolvers, RSS feeds and the spotify-control
   playList helper — without decoding mplayer can't open them) but
   move it BEFORE jsStringEscape. Decoding-then-escaping preserves
   both halves: the caller still gets `Foo Bar/01.mp3` from `Foo%20Bar/
   01.mp3`, and a malicious `Foo%22%0Astop%0A` decodes to `Foo"\nstop\n`
   which jsStringEscape then neutralises into `Foo\"\\nstop\\n` —
   mplayer sees the escaped form, no slave-protocol injection. Wrap
   the decode in a try/catch since decodeURIComponent throws URIError
   on malformed `%FF` sequences.

2. No process recovery (MED-4). spawn() ran exactly once at module
   load, handled only the 'close' event, and ignored 'error' and
   stdin 'error'. So:
     - if mplayer crashed (SIGSEGV / OOM / decoder bug), every
       subsequent exec() wrote to a closed stdin and crashed the
       entire backend-player with EPIPE
     - if mplayer's binary went missing or fork() failed, the spawn
       error went uncaught and tore down the process

   Wrap the spawn in spawnMplayer() and attach error / stdin-error
   listeners. On unexpected close, schedule a respawn with capped
   exponential backoff (1, 2, 4, 8, 16, 30, 30…s). Reset the counter
   once a fresh process has been alive for 30s. The 'close' event
   still fires on intentional close() so existing teardown semantics
   are preserved (new `shutdown` flag distinguishes intentional from
   crash). Commands issued during the crash-respawn window are
   debug-logged and dropped instead of throwing — listeners can react
   to the new 'mplayer-crash' event.
…stale meta (MED-1)

The volume-up handler had a TOCTOU race that let fast touchscreen
taps push playback past muPiBoxConfig.mupibox.maxVolume — the
Hörschutz cap parents rely on. The bug shape:

  exec(cmdVolume, callback) // async amixer-read, NOT awaited
  if (currentMeta.volume < muPiBoxConfig.mupibox.maxVolume) {
    cmdCall(volumeUp)
    currentMeta.volume += 5
  }

The exec was fired but the callback (which updates currentMeta.volume)
ran later, while the if-check immediately used currentMeta.volume from
whatever value the LAST setVolume left it at. With rapid taps:
  tap 1: cmd reads start, sync check sees stale volume X < cap → +5,
         currentMeta.volume = X+5 (locally, optimistic)
  tap 2 (before tap-1's read returns): same path, sees X+5 < cap → +5,
         currentMeta.volume = X+10
  tap 3: …
The actual amixer state climbs, but the local-optimistic view stays
under the cap until reads finally catch up. Net: cap exceeded.

Fix:

1. Promisify the amixer-read so each setVolume call awaits the
   ACTUAL hardware volume before deciding.
2. Add a serial queue (`_volumeOpQueue`) so concurrent setVolume
   invocations chain rather than overlap. Each chained op reads,
   checks, writes atomically with respect to other queued ops.
3. Bound the local update with Math.min/Math.max so even if amixer
   misreports, currentMeta stays in [0, maxVolume].
4. Catch failures inside the chained .then so one bad op (e.g.
   amixer read failed during boot) doesn't poison the queue.
err.body was dereferenced unconditionally even though .error?.status had
optional chaining; for any non-API error (DNS failure, connection drop,
JSON parse error in the SDK, token-refresh HTTP failure) the handler
itself crashed with TypeError, taking the whole backend-player process
down via pm2 unhandledException. Add optional chaining on err and body.
chromium-autostart.sh fires from two paths that race:
  1) restart_kiosk.sh after the admin "Restart services" click
  2) dietpi-login auto-respawn on tty2 once the previous chromium
     browser process dies
Without a guard each invocation runs its own `mplayer ${START_SOUND} &`
in the background. Live monitoring during a single user click showed
2 chromium-autostart.sh instances and 3 mplayer-startup.wav instances
running concurrently — the user reported the boot sound layered on
top of itself.

pkill any prior mplayer that's still playing the same wav before
launching a fresh one. The `${START_SOUND}` path (the wav) is unique
enough that the regex never matches the backend-player's slave-mode
mplayer (which is invoked without a media-file path on its argv).
The BQ25792 driver previously raised I2CError on any OSError from
the underlying smbus2 call. On a Pi the typical OSError is errno
121 "Remote I/O error" which occurs when an I2C transaction gets
interrupted mid-byte (transient bus glitch, interleaved
transaction). After such an error the bus is in an indeterminate
state — subsequent reads also fail until the SMBus handle is
re-opened.

Without the auto-reconnect a single Remote-I/O-error knocks the
mupihat daemon into a stale state: the periodic_json_dump cycle
keeps logging errors, /tmp/mupihat.json goes stale until the
watchdog eventually resets the service.

Add a new _reopen_bus() helper that closes-and-reopens the
smbus2.SMBus handle. In safe_execute(), catch OSError specifically
(not the broader Exception catch), call _reopen_bus(), and retry
the operation exactly once. If the retry also fails the original
I2CError escalation path runs as before.

The retry rebinds the bound method to the new bus handle so the
original `self.bq.foo` reference doesn't dangle on the closed
handle. Bound-method detection via hasattr(func, '__self__').

Other Exceptions (non-OSError) keep the original behaviour: log
and re-raise as I2CError.
AR5-5: mupihat.py runs the BQ25792 driver in three threads — the
periodic_json_dump daemon plus two Flask request workers (`/` and
`/api/registers`). Without serialisation they can interleave I2C
transactions on the same smbus2.SMBus handle, producing the
"[Errno 121] Remote I/O error" bursts visible in the mupi_hat journal
(observed live during charging on 2026-05-15). The Pi's i2c-bcm2835
driver doesn't fully recover from a half-completed sequence, so even
benign collisions leave the bus in a state where read_all_register
returns garbage or fails outright.

Fix: module-level threading.Lock in mupihat.py wrapping every hat.*
call. periodic_json_dump now acquires the lock in three short phases
around the watchdog reset, the bulk register read, and the to_json /
log_register_values follow-up reads — sleeps stay outside the lock so
request workers can fit between phases (no pathological 5-second
waits). The JSON file write also moved outside the lock: it doesn't
touch I2C and can take 5-30 ms on SD.

M6: companion auto-reconnect in mupihat_bq25792.py safe_execute().
When a single I2C transaction throws OSError (errno 121 is the
common case), the wrapper now closes the SMBus handle, re-opens it
with smbus2.SMBus(self.i2c_device), and retries the operation once
before escalating to I2CError. This clears the hung bus state that
previously persisted until the next cycle (or until mupihat-daemon
restart in worst-case watchdog scenarios). The retry rebinds the
bound method to the new bus handle so the original `self.bq.foo`
reference doesn't dangle on the closed handle.

Behaviour preserved: timing identical (sleep(0.1) + sleep(1) +
sleep(3.9) per cycle), all original error logging kept, no API
surface change. Only the locking and the OSError-recovery path are
new.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant