perf(inlet): speed up pull_chunk with bulk buffer slicing#111
Merged
cboulay merged 4 commits intoJun 17, 2026
Conversation
cboulay
reviewed
Jun 16, 2026
| if dest_obj is None: | ||
| # Convert the whole ctypes buffer to a Python list in a single | ||
| # bulk slice (far faster than indexing the array element by | ||
| # element), then split it into one list per sample. |
Contributor
There was a problem hiding this comment.
No need to include the verbose comment. The reason for the change is in the git history and not needed in the code.
Contributor
Author
There was a problem hiding this comment.
makes sense. I removed it
Convert the ctypes data and timestamp buffers to Python lists with a single bulk slice instead of indexing element-by-element inside a nested comprehension. This is ~3-4x faster at extracting multi-channel chunks (measured on the extraction step alone) and produces byte-identical output. Also use integer floor division for the sample count instead of float division plus repeated int() truncation.
The rationale lives in the commit history; the code itself does not need it. Addresses review feedback on labstreaminglayer#111.
Cover the two paths the bulk-slice extraction must preserve: a multi-channel numeric chunk and a variable-length string chunk. Pushes a known chunk and pulls it back, asserting identical values, list[list] shape, and timestamp list type. The string case (empty and multi-byte values) exercises the cf_string decode path that previously lacked coverage.
.claude/ and CLAUDE.md are local tooling artifacts that should not be tracked.
665a42f to
7948d36
Compare
cboulay
approved these changes
Jun 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(PR and content generated with the help of Claude)
What
Speed up
StreamInlet.pull_chunk— the main data-receive hot path — without changing its behavior or output.Why
The current implementation reads the ctypes data/timestamp buffers one element at a time inside a nested list comprehension. Each
data_buff[i]access crosses the ctypes boundary individually, which is slow. A single bulk slice (data_buff[:n]) converts the whole buffer in one C-level pass, and we then split it into per-sample lists in Python.It also replaces
num_elements / num_channels(float) + repeatedint(...)truncation with integer floor division.Impact
Measured on the extraction step alone (the changed code), output verified identical via
assert old() == new():~3-4x for typical multi-channel chunks; smaller for single-channel streams. This is the Python-side extraction only — end-to-end gain depends on how extraction-bound the receive loop is. The
cf_stringpath still pays per-element.decode(), so its relative gain is smaller.Behavior
Byte-identical output: same
list[list]of values, same timestamp list type, unchangedcf_stringdecoding andfree_char_p_array_memorycall, unchangeddest_objpath. Verified with a live localhost round-trip across the numeric, string, anddest_objpaths; existing test suite passes.