Feat/refacto and telop#9
Conversation
…ures - Implemented RGB LED chenillard that runs on startup when connected to rosbridge. - Updated README to include instructions for standalone LED testing and configuration options. - Enhanced telemetry service to monitor linear velocity and critical battery levels, with corresponding audit events. - Updated adapter factory to support new RGB functionality and ensure proper configuration handling. - Added tests for new features, including telemetry checks for battery and velocity thresholds.
Review Summary by Qodo(Agentic_describe updated until commit d39fded)Harden teleop reliability with deadman watchdog, fix rosbridge concurrency, add RGB LED effects
WalkthroughsDescription• Refactored rosbridge WebSocket to eliminate concurrent reads during ping measurement, ensuring reliable telemetry delivery • Added teleop deadman watchdog that auto-stops robot if control messages cease arriving (frontend crash, network interruption) • Implemented RGB LED chenillard animation on /rgb topic with configurable hue sweep and LED indexing • Enhanced telemetry pipeline with battery EMA filtering, critical threshold alerts, and linear velocity monitoring • Extended teleop command parsing with axis clamping, legacy key support, and comprehensive validation • Added health endpoint and structured command acknowledgment/error responses for WebSocket protocol Diagramflowchart LR
A["WebSocket Client"] -->|teleop.move| B["WebSocketHandler"]
B -->|parse_teleop_command| C["TeleopCommand"]
C -->|send_velocity| D["RosbridgeAdapter"]
D -->|notify_command| E["TeleopWatchdog"]
E -->|timeout check| F["send_velocity ZERO"]
D -->|publish| G["RosbridgeClient"]
G -->|single consumer| H["WebSocket Listener"]
H -->|resolve pending| I["Ping Futures"]
D -->|start| J["RgbChenillard"]
J -->|publish HSV→RGB| G
File Changes1. src/robocoop_backend/robocoop_backend/adapters/rosbridge_client.py
|
Code Review by Qodo
1. Subscribe logs false success
|
| def _check_velocity(self, velocity: float) -> None: | ||
| if velocity > self.velocity_warning_threshold: | ||
| if self.audit_service and not self._velocity_alert_sent: | ||
| self._velocity_alert_sent = True | ||
| self.audit_service.record(AuditEvent( | ||
| action="velocity.high", | ||
| actor="system", | ||
| payload={ | ||
| "velocity": round(velocity, 2), | ||
| "threshold": self.velocity_warning_threshold, | ||
| }, | ||
| )) | ||
| else: | ||
| self._velocity_alert_sent = False |
There was a problem hiding this comment.
1. Reverse speed not alerted 🐞 Bug ≡ Correctness
TelemetryService._check_velocity() only triggers velocity.high when velocity > threshold, so reverse motion (negative velocity) above the threshold magnitude will never emit the audit event. This contradicts the documented behavior that the alert is based on linear speed exceeding a threshold.
Agent Prompt
### Issue description
`TelemetryService._check_velocity()` uses a signed comparison (`velocity > threshold`). Negative velocities (reverse motion) will not trigger `velocity.high`, even though the feature is documented as a speed threshold.
### Issue Context
Audit docs describe `velocity.high` as “Linear speed exceeds threshold”, which implies magnitude rather than sign.
### Fix Focus Areas
- src/robocoop_backend/robocoop_backend/modules/robot/telemetry_service.py[81-94]
### Suggested fix
- Compare `abs(velocity)` to `velocity_warning_threshold`.
- Consider storing/reporting the magnitude in the audit payload (or include both `velocity` and `abs_velocity`), but at minimum ensure reverse motion triggers the event.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| async def _run(self) -> None: | ||
| hue = 0.0 | ||
| try: | ||
| while True: | ||
| for led in range(self.led_min, self.led_max + 1): | ||
| r, g, b = self._hsv_to_rgb255(hue) | ||
| await self._publish_color(r, g, b, float(led)) | ||
| hue = (hue + self.hue_step) % 360.0 | ||
| await asyncio.sleep(self.frame_delay) |
There was a problem hiding this comment.
2. Chenillard can busy-spin 🐞 Bug ➹ Performance
RgbChenillard._run() only awaits inside the for led in range(...) loop; if led_min > led_max due to config, the range is empty and the surrounding while True spins without any await. This can peg a CPU core and block the asyncio event loop.
Agent Prompt
### Issue description
`RgbChenillard._run()` awaits only inside the LED loop. If the configured LED range is invalid (`led_min > led_max`), the `for` loop does zero iterations and the `while True` loop runs without yielding, causing a tight busy-loop.
### Issue Context
`led_min` / `led_max` are loaded directly from config and passed into `RgbChenillard`.
### Fix Focus Areas
- src/robocoop_backend/robocoop_backend/modules/robot/rgb_chenillard.py[52-60]
- src/robocoop_backend/robocoop_backend/adapters/factory.py[68-75]
### Suggested fix
- Validate config in `RgbChenillard.__init__` (or at startup) and raise a clear error if `led_min > led_max`, **or** swap/clamp them.
- Additionally (defensive), ensure the `while True` loop always yields at least once per iteration, e.g. add an `await asyncio.sleep(self.frame_delay)` after the `for` loop when it produced no frames.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
|
@copilot resolve the merge conflicts in this pull request |
|
Persistent review updated to latest commit d39fded |
| async def subscribe(self, topic: str, msg_type: str, callback: Callable) -> None: | ||
| if not self._websocket: | ||
| logger.error("Cannot subscribe: not connected") | ||
| return | ||
|
|
||
| sub_id = f"sub_{topic.replace('/', '_')}" | ||
| self._subscribers[topic] = callback | ||
| self._subscription_ids[topic] = sub_id | ||
| try: | ||
| await self._websocket.send(json.dumps({ | ||
| "op": "subscribe", "topic": topic, "type": msg_type, "id": sub_id, | ||
| })) | ||
| logger.info(f"Subscribed to {topic}") | ||
| except Exception as e: | ||
| logger.error(f"Subscribe failed for {topic}: {e}") | ||
|
|
||
| await self._send_json({ | ||
| "op": "subscribe", | ||
| "topic": topic, | ||
| "type": msg_type, | ||
| "id": sub_id, | ||
| }) | ||
|
|
||
| logger.info(f"Subscribed to {topic}") |
There was a problem hiding this comment.
1. Subscribe logs false success 🐞 Bug ☼ Reliability
RosbridgeClient.subscribe() logs "Subscribed" unconditionally even if the WebSocket send fails, because _send_json() swallows exceptions and doesn’t signal failure. This can leave the backend believing critical telemetry subscriptions are active while rosbridge never received them, with misleading logs.
Agent Prompt
## Issue description
`RosbridgeClient.subscribe()` logs success even if the underlying `send()` failed, because `_send_json()` catches and logs exceptions and then returns normally. This creates a false-success state where callbacks are registered and logs say “Subscribed”, but rosbridge never received the subscribe frame.
## Issue Context
Subscriptions are the backbone of battery/odom telemetry; misleading success logs make outages hard to detect and can look like downstream parsing bugs.
## Fix Focus Areas
- src/robocoop_backend/robocoop_backend/adapters/rosbridge_client.py[83-145]
### Suggested implementation direction
- Make `_send_json()` either:
- raise on send failure, OR
- return `bool` success/failure.
- In `subscribe()`:
- only log “Subscribed to …” when send succeeded,
- and consider only storing `_subscribers/_subscription_ids` after a successful send (or remove them if send fails).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| async def stop(self) -> None: | ||
| self._running = False | ||
|
|
||
| if self._task: | ||
| self._task.cancel() | ||
| try: | ||
| await self._task | ||
| except asyncio.CancelledError: | ||
| pass | ||
|
|
||
| self._task = None | ||
| logger.info("Teleop watchdog stopped") |
There was a problem hiding this comment.
2. Watchdog stop doesn’t halt 🐞 Bug ☼ Reliability
TeleopWatchdog.stop() cancels the watchdog loop without sending a zero-velocity command even if the last teleop command was non-zero, and BackendContext.disconnect() stops the watchdog before disconnecting the adapter. This breaks the deadman safety guarantee during graceful shutdown/restart because the last commanded velocity is never actively cleared.
Agent Prompt
## Issue description
Stopping the watchdog currently only stops the monitoring task; it does not publish a final stop command when the last command was non-zero. Since `BackendContext.disconnect()` stops the watchdog as part of shutdown, the system can shut down without ever sending a final zero velocity.
## Issue Context
The PR’s stated purpose is teleop safety via a deadman watchdog. A graceful shutdown/restart is a common operational path and should be safe too.
## Fix Focus Areas
- src/robocoop_backend/robocoop_backend/modules/robot/teleop_watchdog.py[50-79]
- src/robocoop_backend/robocoop_backend/app/backend_context.py[72-82]
### Suggested implementation direction
- In `TeleopWatchdog.stop()` (or in `BackendContext.disconnect()` before stopping/cancelling tasks), if the last command was non-zero, call `adapter.send_velocity(ZERO_TELEOP_COMMAND)` once.
- Wrap the send in a try/except and log if it fails (don’t block shutdown).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Summary
Harden robot teleoperation reliability by introducing a deadman safety watchdog, fixing rosbridge WebSocket concurrency issues, and improving real robot teleop configuration for Yahboom M3 Pro.
Description
This PR strengthens the backend robot control pipeline for real-world operation by improving teleoperation safety and rosbridge communication reliability.
A teleop deadman watchdog was introduced to automatically stop the robot if control messages stop arriving (e.g. frontend freeze, browser crash, network interruption), preventing uncontrolled movement.
Additionally, the rosbridge client was refactored to eliminate concurrent WebSocket reads during latency measurement, ensuring stable telemetry delivery and preventing race conditions affecting battery, odometry, and ping diagnostics.
The real robot configuration was also updated with safer initial teleop limits for Yahboom M3 Pro testing.
What's Changed
Affected areas
Key files
src/robocoop_backend/robocoop_backend/adapters/rosbridge_client.pyFixed concurrent WebSocket read issue by enforcing a single message consumer and pending ping future dispatch.
src/robocoop_backend/robocoop_backend/modules/robot/teleop.pyAdded
TeleopCommand.is_zero()helper for watchdog integration.src/robocoop_backend/robocoop_backend/modules/robot/teleop_watchdog.pyAdded deadman switch auto-stop watchdog for teleoperation safety.
src/robocoop_backend/robocoop_backend/app/backend_context.pyIntegrated watchdog lifecycle management (startup / shutdown).
src/robocoop_backend/robocoop_backend/app/websocket_handler.pyAdded watchdog notification on every valid teleop command.
src/robocoop_backend/robocoop_backend/tests/unit/adapters/test_rosbridge_client.pyAdded rosbridge reliability and ping handling tests.
src/robocoop_backend/robocoop_backend/tests/unit/modules/test_teleop_watchdog.pyAdded deadman watchdog unit tests.
src/robocoop_backend/robocoop_backend/tests/unit/app/test_websocket_handler.pyAdded watchdog notification coverage.
src/robocoop_backend/robocoop_backend/tests/integration/test_context_lifecycle.pyUpdated integration config expectations.
src/robocoop_bringup/config/common.params.yamlAdded deadman switch config defaults.
src/robocoop_bringup/config/real.params.yamlAdded safer Yahboom M3 Pro teleop limits.
Details
Rosbridge reliability hardening
Fixed a critical architecture issue where multiple coroutines could read from the same rosbridge WebSocket simultaneously.
Before:
measure_ping()also read directly from the same socketAfter:
This improves reliability for:
Teleop deadman safety
Added an automatic emergency stop safeguard for teleoperation.
Behavior:
This protects against:
Default configuration: