-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Currently to translate word_id -> word_str (done for each key in each selected row, potentially millions of times per select) - we need to read lock global dictionary shard.
This incurs significant overhead just for locks themselves.
An in cases where contention might be high (when per-repacker caching is inefficient, e.g. nginx urls) - might also slow down new packets processing.
The lock is only needed, because we use std::deque to find a word by offset, and it might get inserted into while we're reading (changing it's structure).
The proposed idea is to rework dictionary shard to use just a contiguous mmap()-ed memory region, enabling fully-lockless read-at-offset path (as the mmap()-ed region pointer never changes).
The word_t size is less than modern x86 CPU's cache line size, but due to the strong x86 cache-coherence model - this might only incur a performance penalty, but not compromise correctness.
As far as i understand it, anyway :)