Fix chunk indexing for streams with differing start times#12
Merged
Conversation
…bounds check at the top of the per-stream loop.
…art times NWBIteratorState built each regular stream's chunk_offsets table from the stream's own first sample and computed first_chunk only to truncate the table, never to delay emission. _build_chunk_messages_static then indexed those tables with the global chunk_ix, so every stream's sample 0 landed in global chunk 0 regardless of when it actually started. A stream starting later than the file origin (e.g. CereLink NPLAY vs Hub2) was thus emitted several chunks too early, with time offsets missing its start time, and its shorter offset table crashed with IndexError once chunk_ix passed its end. - _preload now builds regular-stream offsets on the shared global chunk grid (matching the event-stream branch), clamped to [0, n_samples]. Tables are uniform length n_chunks and late/early streams contribute empty slices outside their span instead of being shifted. - _build_chunk_messages_static adds info.t0 to the rate-only time offset so a late-starting stream is timed against its true start. - Keep a defensive bounds guard against a short offset table. Tests: add late-start alignment regression and a defensive short-table test.
read_by_index reported a rate-only stream's time offset as ts_off+gain*idx, omitting the stream's starting_time. Sample indices are 0-based from the stream start, so a stream with non-zero starting_time was mis-timed — and the value disagreed with NWBClockDrivenProducer's own file_t = t0 + idx/fs bookkeeping. Add info.t0, mirroring the iterator's chunk-offset fix. The timestamped branch already uses absolute times and is unchanged.
There was a problem hiding this comment.
Pull request overview
This PR fixes misalignment and incorrect time offsets when reading NWB files containing multiple continuous streams that start at different times, and prevents an IndexError crash when per-stream chunk tables are shorter than the file-wide chunk grid.
Changes:
- Build rate-only stream
chunk_offsetson the shared (file-wide) chunk grid so late-starting streams emit empty slices before their start rather than being shifted to chunk 0. - Include each stream’s
t0(starting_time) in rate-only time-offset calculations in bothNWBAxisArrayIteratorchunk emission andNWBSlicer.read_by_index. - Add regression tests covering late-starting stream alignment, ragged offset-table bounds guarding, and slicer offsets including
starting_time.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/ezmsg/nwb/iterator.py |
Fixes chunk offset table construction and time-offset computation; adds a defensive guard against short offset tables. |
src/ezmsg/nwb/slicer.py |
Fixes read_by_index time offset for rate-only streams to include starting_time (t0). |
tests/test_iterator.py |
Adds regression tests for late-start alignment and for avoiding crashes with truncated chunk_offsets. |
tests/test_slicer.py |
Adds regression test ensuring read_by_index offset includes a non-zero starting_time. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Building regular-stream chunk offsets with np.round assigned each global chunk boundary to the nearest sample. When chunk_dur is not an integer multiple of the sample period, that pulls a pre-boundary sample into the chunk and disagrees with the event/timestamped paths, which use searchsorted(side="left"). Use ceil with a small epsilon to match first-sample-at-or-after-boundary semantics and tolerate float drift. Add a regression test asserting the offsets equal searchsorted(side="left") for a non-integer samples-per-chunk stream (256 Hz, 0.1 s chunks).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This branch fixes a crash and a silent data-correctness bug that affect NWB files containing streams that do not all start at the same time (for example CereLink
Hub2andNPLAY, where one begins several seconds after the other). Such streams were misaligned against each other on read, and in some cases the iterator crashed outright. The root cause was the same in two places: per-stream chunk offset tables were built relative to each stream's own first sample rather than the shared, file-wide chunk grid, and the rate-only time offset omitted the stream's start time.The bugs
When a regular (rate-only) stream starts later than the file's overall start,
NWBIteratorbuilt itschunk_offsetstable from the stream's own first sample and used the computedfirst_chunkvalue only to truncate the table, never to delay emission._build_chunk_messages_staticthen indexed every stream's table with the globalchunk_ixstarting at 0, so each stream's sample 0 was dumped into global chunk 0 regardless of when it actually started. This produced two failures. First, a temporal-placement bug: a late-starting stream was emitted several chunks too early and presented as simultaneous with streams that genuinely start at the file origin. Second, an offset bug: the rate-only branch computedts_off + gain * start_idx, omitting the stream'st0, so its message time offsets were low by exactly the stream's start time. Because the truncated table was shorter than the file-widen_chunks, iterating past the shorter stream's last entry also raised anIndexErrorand crashed the run.The same
t0omission existed independently inNWBSlicer.read_by_index. Its rate-only branch reportedts_off + gain * start_idx, which disagrees withNWBClockDrivenProducer's own playback bookkeeping (file_t = t0 + idx / fs). Any stream with a non-zerostarting_timeread through that path was mis-timed by its start offset.The fix
_preloadnow builds regular-stream chunk offsets on the shared global chunk grid, the same way the event-stream branch already did, by computing the sample index at each global chunk boundary and clamping to[0, n_samples]. Offset tables are therefore uniform lengthn_chunksfor every stream, late-starting or early-ending streams contribute empty slices outside their span instead of being shifted, and chunkjnow covers the same wall-clock window for all streams._build_chunk_messages_staticaddsinfo.t0to the rate-only time offset so a late-starting stream is timed against its true start, and retains a defensive bounds guard so a short offset table degrades gracefully instead of crashing the whole chunk.NWBSlicer.read_by_indexgets the matchinginfo.t0correction in its rate-only branch; the timestamped branch already used absolute times and is unchanged.Tests
Added
test_late_starting_stream_stays_aligned, which builds a two-stream file where the second stream starts two seconds in and asserts it is emitted in the chunks matching its real time with correct offsets. Addedtest_ragged_stream_lengths_do_not_crash, which exercises the defensive bounds guard directly. Addedtest_read_by_index_offset_includes_starting_time, which checks the slicer offset for a stream with non-zerostarting_time. Each new test was confirmed to fail on the pre-fix code and pass after the fix. The full suite passes (134 tests).