Skip to content

[Bug]: Undefined behavior with LockFreeQueue #656

@VzdornovNA88

Description

@VzdornovNA88

Explanation

This is because of the use of this one, at least in MPSC cases.
So I'm giving the only one example here, after which I stopped the manual code review of this library before correcting or proving that I was wrong.

bool CANNetworkManager::send_can_message - the architectural error lies in the fact that this API method cannot be limited to SPSC cases, since it is not even an implementation detail, which means that using such a limited queue implementation leads to undefined behavior in the future - users of this method never assume that such a restriction is imposed on them implicitly by the implementation, even if this was described in the API documentation, which would have severely limited the library's users.

As a result, the implementation itself falls for this bad trick,
in the implementation, we have at least 2 producers :

can hardware thread
void CANNetworkManager::update()
void periodic_update_from_hardware() { CANNetworkManager::CANNetwork.update(); }
void CANHardwareInterface::update()

task controller client thread
void TaskControllerClient::update()

So now we have at least an MPSC case.

We get a simpler case of the same problem for multiple threads of CAN message receivers from various CANHardware , which push , and the same single consumer pops - void CANHardwareInterface::update()

The LockFreeQueue has to be named as SPSCLockFreeQueue since it does not use a CAS in order to protect all its invariants, I havent heard about such implementations. One of the well-known and most effective implementations for MPMC case uses at least one CAS in order to safely increase the index - (https://github.com/khizmax/libcds/blob/9985d2a87feaa3e92532e28f4ab762a82855a49c/cds/container/vyukov_mpmc_cycle_queue.h#L255) - despite the fact that operations are relaxed there, it is still protected thanks to a special atomic sequence counter with memory_order_acquire/release semantics.

Environment:

  • OS: Ubuntu 20.04
  • Compiler GCC 13.1.0

Test
I used special test for checking LockFreeQueue in MPMC(MPSC) case (the test code will be provided in the pool request for this issue, the test name will be more generic because I abandoned implementing a new lock-free queue)

[----------] 1 test from LockFreeQueueRaceTest
[ RUN ] LockFreeQueueRaceTest.MultipleProducersMultipleConsumers
Produced: 20000000, Consumed: 16345815
../test/lockfree_queue_race_test.cpp:97: Failure
Expected equality of these values:
consumed_count.load()
Which is: 16345815
TOTAL_ITEMS
Which is: 20000000
[ FAILED ] LockFreeQueueRaceTest.MultipleProducersMultipleConsumers (2457 ms)
[----------] 1 test from LockFreeQueueRaceTest (2457 ms total)

Metadata

Metadata

Assignees

No one assigned

    Labels

    investigatingLooking into this issue / need more info

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions