Incorrect bounds causing read to return NaN-padded signal data

Hi. 

I’m using digital-rf (version 2.6.8) for storage and retrieval of radar data. I’m making a simple unittest where I’m essentially writing 2 hours worth of samples to storage, and reading them back to verify correctness. Doing this, I’m experiencing some perhaps surprising behavior with respect to bounds. Here is the essential reading code. 

```
idx_first, idx_last = reader.get_bounds(“chnl”)
data = reader.read(idx_first, idx_last, "chnl")
```

What I’m getting then is an array which is longer than the the one I wrote, e.g.

`[NaN, NaN, NaN, sample, sample, …, sample, NaN, NaN, NaN]`  

The signal itself is complete and correct, but it’s padded with NaN’s on both ends. Similarly, for integer signals the padding is the min value for the int type.

This is at least inconvenient, as programmers will need to do extra work to separate signal from non-signal. If the signal could also include NaN or min_int values, this could even be problematic. Moreover, this behavior is not mentioned in the docs I think (I've been looking through a somewhat aged pdf).

After some digging, it appears to me that this behavior is linked to the setting for `file_cadence_millis` which sets the time-duration for each .h5 file, and also linked to the `start_global_index` which is set for the writer object, defining the time for the start of recording.

Specifically, if `start_global_index` is set to a moment in time matching up with the time-boundaries between .h5 files, there is no padding, as expected. However, if a random timestamp is used, it appears that `get_bounds()` will provide the index boundaries corresponding to all written files - not just the signal boundaries. For example, given a `file_cadence_millis` of 2*1000ms, you can expect 2 seconds worth of padding, divided between the start and end of recording. The concept of continuous blocks also appears to be insensitive to this padding, treating it as part of the signal. Furthermore, if two channels are recorded in parallel, with different `file_candence_millis`, boundaries for each channel will be different, adding to the confusion . 

It is not difficult, though, to fix the get_bounds() method so that it returns the correct indexes for the signal. However, this requires that `start_global_index`, which is set by the writer object, is known at read time. Unfortunately, as far as I can see, this value is not persisted by the framework, so storing it and making it available at read time requires additional measures by the programmer, which is again inconvenient.

I would suggest that this should be handled by the framework. For example, `start_global_index` could be made available as a property on the reader object, and the implementation of `get_bounds()` could be updated to take this into account. Interestingly, the reader properties already have one field `epoch`, which I expected to be the `start_global_index`. Surprisingly though, the `epoch` field is only the static string `'1970-01-01T00:00:00Z'`, so it provides no useful information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect bounds causing read to return NaN-padded signal data #60

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect bounds causing read to return NaN-padded signal data #60

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions