Issue Description
I ran into this while modifying the driver to add a custom service. I wanted to check whether I needed to lock anything when touching pending_requests_, saw request_mutex_ declared in client.hpp, and then realised it is never actually locked anywhere. It is declared and constructed, but nothing references it.
|
std::unordered_map<std::string, std::deque<std::promise<CommandResponse>>> pending_requests_; |
|
std::mutex request_mutex_; |
So pending_requests_ (the unordered_map<string, deque<promise>>) is mutated from two threads with no synchronisation:
- caller thread in
send_command: pending_requests_[...].emplace_back(...) (client.cpp#L203)
- polling thread in
process_json_object: contains / operator[] / front / set_value / pop_front (client.cpp#L290-L292)
Furthermore, send() at client.cpp#L196 runs before the promise is registered at client.cpp#L203, so a very fast response could be dropped and leave f.get() blocked forever -- not sure if I should file this as a separate issue (or if it even constitutes one).
Steps to Reproduce
|
if (pending_requests_.contains(response.response_to) && !pending_requests_[response.response_to].empty()) { |
|
pending_requests_[response.response_to].front().set_value(response); |
|
pending_requests_[response.response_to].pop_front(); |
set_value can immediately unblock a caller's f.get(), which returns and lets the next command run emplace_back on the same deque (from a service callback scheduled by the ROS executor, for example) while the polling thread is still about to run pop_front. enable_acoustics, enable_dark_mode, enable_periodic_cycling, and set_configuration all send the command name set_config, so two back-to-back config calls land on the same deque. Concurrent emplace_back + pop_front on a std::deque is UB. A brand new command key in emplace_back can also rehash the map while the polling thread reads it (also UB I think?).
I found this by reading the code rather than from a crash, so I do not have a deterministic repro. It is probably timing dependent and could show up intermittently; I'm not sure if this is being pedantic.
Expected Behavior
None observed. This was found by reading the code, not from a runtime failure. If it does trigger, the likely symptoms are intermittent memory corruption, a crash, or a hang.
Error Message
Runtime Environment
Not environment specific -- the race lives in libwaterlinked.
Additional Context
No response
Issue Description
I ran into this while modifying the driver to add a custom service. I wanted to check whether I needed to lock anything when touching
pending_requests_, sawrequest_mutex_declared inclient.hpp, and then realised it is never actually locked anywhere. It is declared and constructed, but nothing references it.waterlinked_dvl/libwaterlinked/include/libwaterlinked/client.hpp
Lines 130 to 131 in 78eb226
So
pending_requests_(theunordered_map<string, deque<promise>>) is mutated from two threads with no synchronisation:send_command:pending_requests_[...].emplace_back(...)(client.cpp#L203)process_json_object:contains/operator[]/front/set_value/pop_front(client.cpp#L290-L292)Furthermore,
send()at client.cpp#L196 runs before the promise is registered at client.cpp#L203, so a very fast response could be dropped and leavef.get()blocked forever -- not sure if I should file this as a separate issue (or if it even constitutes one).Steps to Reproduce
waterlinked_dvl/libwaterlinked/src/client.cpp
Lines 290 to 292 in 78eb226
set_valuecan immediately unblock a caller'sf.get(), which returns and lets the next command runemplace_backon the same deque (from a service callback scheduled by the ROS executor, for example) while the polling thread is still about to runpop_front.enable_acoustics,enable_dark_mode,enable_periodic_cycling, andset_configurationall send the command nameset_config, so two back-to-back config calls land on the same deque. Concurrentemplace_back+pop_fronton astd::dequeis UB. A brand new command key inemplace_backcan also rehash the map while the polling thread reads it (also UB I think?).I found this by reading the code rather than from a crash, so I do not have a deterministic repro. It is probably timing dependent and could show up intermittently; I'm not sure if this is being pedantic.
Expected Behavior
None observed. This was found by reading the code, not from a runtime failure. If it does trigger, the likely symptoms are intermittent memory corruption, a crash, or a hang.
Error Message
Runtime Environment
Not environment specific -- the race lives in
libwaterlinked.Additional Context
No response