-
Notifications
You must be signed in to change notification settings - Fork 23
Gateway becomes permanently unresponsive when configurations endpoint blocks spin_mutex #332
Copy link
Copy link
Labels
bugSomething isn't workingSomething isn't working
Description
Bug report
Steps to reproduce
- Start gateway with SSE enabled (default) on a system with nodes that accept parameter service but respond slowly (e.g., autostart_node on ROSMASTER M3 Pro)
- Connect web UI - it opens SSE stream + loads entity tree + fetches configurations for each entity
- One configurations request hits a slow node, acquires spin_mutex_, and blocks inside spin_node_until_future_complete
- All subsequent configurations requests queue on spin_mutex_
- httplib thread pool (8 workers) fills up with blocked threads
- Gateway stops responding to ALL HTTP requests permanently
Expected behavior
Gateway should remain responsive even when some nodes are slow to respond to parameter service calls. Other endpoints (health, data, faults, entity tree) should not be affected.
Actual behavior
Gateway process alive (discovery continues) but httplib completely unresponsive. gdb shows 7+ threads blocked on the same futex (spin_mutex_). Only kill -9 recovers.
Confirmed via gdb thread dump:
Thread 30: futex_wait -> lll_lock_wait -> pthread_mutex_lock (spin_mutex_)
Thread 29: futex_wait -> lll_lock_wait -> pthread_mutex_lock (spin_mutex_)
Thread 28: futex_wait -> lll_lock_wait -> pthread_mutex_lock (spin_mutex_)
...
Thread 23: futex_abstimed_wait -> pthread_cond_wait (holding spin_mutex_, waiting on spin_node_until_future_complete)
Root cause
spin_mutex_ in ConfigurationManager serializes ALL parameter service IPC calls. If one SyncParametersClient::list_parameters() call hangs (node accepts service but never responds), spin_mutex_ is held indefinitely. All other configuration requests block waiting for the mutex, exhausting the httplib thread pool.
Environment
- ros2_medkit version: main (post Gateway deadlocks when querying configurations on nodes without parameter service #318 merge)
- ROS 2 distro: Humble
- OS: Ubuntu 22.04 on Jetson Orin Nano (aarch64)
- Robot: Yahboom ROSMASTER M3 Pro with micro-ROS nodes
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working