Skip to content

[BUG]: Data race on pending_requests_ #30

Description

@advaypakhale

Issue Description

I ran into this while modifying the driver to add a custom service. I wanted to check whether I needed to lock anything when touching pending_requests_, saw request_mutex_ declared in client.hpp, and then realised it is never actually locked anywhere. It is declared and constructed, but nothing references it.

std::unordered_map<std::string, std::deque<std::promise<CommandResponse>>> pending_requests_;
std::mutex request_mutex_;

So pending_requests_ (the unordered_map<string, deque<promise>>) is mutated from two threads with no synchronisation:

  • caller thread in send_command: pending_requests_[...].emplace_back(...) (client.cpp#L203)
  • polling thread in process_json_object: contains / operator[] / front / set_value / pop_front (client.cpp#L290-L292)

Furthermore, send() at client.cpp#L196 runs before the promise is registered at client.cpp#L203, so a very fast response could be dropped and leave f.get() blocked forever -- not sure if I should file this as a separate issue (or if it even constitutes one).

Steps to Reproduce

if (pending_requests_.contains(response.response_to) && !pending_requests_[response.response_to].empty()) {
pending_requests_[response.response_to].front().set_value(response);
pending_requests_[response.response_to].pop_front();

set_value can immediately unblock a caller's f.get(), which returns and lets the next command run emplace_back on the same deque (from a service callback scheduled by the ROS executor, for example) while the polling thread is still about to run pop_front. enable_acoustics, enable_dark_mode, enable_periodic_cycling, and set_configuration all send the command name set_config, so two back-to-back config calls land on the same deque. Concurrent emplace_back + pop_front on a std::deque is UB. A brand new command key in emplace_back can also rehash the map while the polling thread reads it (also UB I think?).

I found this by reading the code rather than from a crash, so I do not have a deterministic repro. It is probably timing dependent and could show up intermittently; I'm not sure if this is being pedantic.

Expected Behavior

None observed. This was found by reading the code, not from a runtime failure. If it does trigger, the likely symptoms are intermittent memory corruption, a crash, or a hang.

Error Message

Runtime Environment

Not environment specific -- the race lives in libwaterlinked.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions