Skip to content

Incorrect bounds causing read to return NaN-padded signal data #60

@ingararntzen

Description

@ingararntzen

Hi.

I’m using digital-rf (version 2.6.8) for storage and retrieval of radar data. I’m making a simple unittest where I’m essentially writing 2 hours worth of samples to storage, and reading them back to verify correctness. Doing this, I’m experiencing some perhaps surprising behavior with respect to bounds. Here is the essential reading code.

idx_first, idx_last = reader.get_bounds(“chnl”)
data = reader.read(idx_first, idx_last, "chnl")

What I’m getting then is an array which is longer than the the one I wrote, e.g.

[NaN, NaN, NaN, sample, sample, …, sample, NaN, NaN, NaN]

The signal itself is complete and correct, but it’s padded with NaN’s on both ends. Similarly, for integer signals the padding is the min value for the int type.

This is at least inconvenient, as programmers will need to do extra work to separate signal from non-signal. If the signal could also include NaN or min_int values, this could even be problematic. Moreover, this behavior is not mentioned in the docs I think (I've been looking through a somewhat aged pdf).

After some digging, it appears to me that this behavior is linked to the setting for file_cadence_millis which sets the time-duration for each .h5 file, and also linked to the start_global_index which is set for the writer object, defining the time for the start of recording.

Specifically, if start_global_index is set to a moment in time matching up with the time-boundaries between .h5 files, there is no padding, as expected. However, if a random timestamp is used, it appears that get_bounds() will provide the index boundaries corresponding to all written files - not just the signal boundaries. For example, given a file_cadence_millis of 2*1000ms, you can expect 2 seconds worth of padding, divided between the start and end of recording. The concept of continuous blocks also appears to be insensitive to this padding, treating it as part of the signal. Furthermore, if two channels are recorded in parallel, with different file_candence_millis, boundaries for each channel will be different, adding to the confusion .

It is not difficult, though, to fix the get_bounds() method so that it returns the correct indexes for the signal. However, this requires that start_global_index, which is set by the writer object, is known at read time. Unfortunately, as far as I can see, this value is not persisted by the framework, so storing it and making it available at read time requires additional measures by the programmer, which is again inconvenient.

I would suggest that this should be handled by the framework. For example, start_global_index could be made available as a property on the reader object, and the implementation of get_bounds() could be updated to take this into account. Interestingly, the reader properties already have one field epoch, which I expected to be the start_global_index. Surprisingly though, the epoch field is only the static string '1970-01-01T00:00:00Z', so it provides no useful information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions