feat: Add regrid functionality based on match geometry#1597
feat: Add regrid functionality based on match geometry#1597eliascapriles-NOAA wants to merge 29 commits intoOSOceanAcoustics:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1597 +/- ##
===========================================
- Coverage 85.58% 60.63% -24.95%
===========================================
Files 79 79
Lines 6998 7139 +141
===========================================
- Hits 5989 4329 -1660
- Misses 1009 2810 +1801
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hey @eliascapriles-NOAA : Thanks for the PR! Below are some comments based on quick look:
|
|
Sounds good. I will get started on implementing the changes ! |
|
Hi Wu-Jung sorry for the delay ! The tests uncovered a couple of bugs that I wanted to fix before submitting for a new PR. I have added unit tests for my helper function, and integration test using SV data for the regridding function. Let me know if there are any changes ! |
Merged changes in main to regrid for PR
for more information, see https://pre-commit.ci
|
Thanks @eliascapriles-NOAA ! I'm going to ask @LOCEANlloydizard to help review this since you had most of the discussions with him. @LOCEANlloydizard - Could you please take a look? feel free to ping me for discussions. Thanks! |
|
Sounds good ! Thanks @leewujung |
LOCEANlloydizard
left a comment
There was a problem hiding this comment.
Hey @eliascapriles-NOAA, thx for the PR!
Following our discussion, a small recap (you probably noted more!):
- the errors from CI are now gone with the merge of pandas < 3, just need to address the assertionError
- remove the +20 padding at the end of the function
- double-check that pings are actually aligned before regridding (there’s already a helper in echopype that does this, see getting_started notebook)
- update the notebook to call the function from your PR, so I can run it on my side too
- we can go from there for other points !
One thing I wanted to flag more generally: right now the regridding is done by looping over channel in Python, and inside that running an apply_ufunc that vectorises over ping_time × range_sample.
This is more a design question than a bug: are we happy with looping over channels in Python and vectorising over pings with apply_ufunc, or should we try to make the whole regridding run as one channel-aware vectorised operation? I know you looked into it.., and i could have a look as well! and maybe @leewujung would have recommendations for this?
Cheers!
There was a problem hiding this comment.
This error "TypeError: only integer scalar arrays can be converted to a scalar index" is due to pandas update to version 3.0. Could you merge the latest version of the main echopype repo (pandas is pinned there!)
|
Hey @eliascapriles-NOAA @LOCEANlloydizard : Not sure if I am really making a good suggestion, but since |
|
Hi @leewujung LLoyd brought this up yesterday. The reason I currently have the channels in a loop is because my apply_ufunc is parallelizing the function across ping_time as specificed by the Echoview algorithm. However, I will try to rework my function to parallelize across the channel dimension as well |
Fixture is moved to the correct spot Added a check to make sure that only Sv values will be converted from log
Fixed test to fit with new structure
| Returns | ||
| ------- | ||
| deepest_ping: int | ||
| index of the "deepest" sample |
There was a problem hiding this comment.
something like
"Index of the ping containing the deepest valid range sample."
There was a problem hiding this comment.
How about:
deepest_ping: int
Index of the ping containing with most amount of data before trailing NaN values.
| target_range_da = target_grid.copy() | ||
|
|
||
| deepest_ping_index = get_valid_max_depth_ping(ds_Sv, target_grid=target_grid) | ||
| valid_range_sample = np.argmax(target_grid.isel(ping_time=deepest_ping_index).values) |
There was a problem hiding this comment.
I feel here we could be more concise if 1) we don't trim ? (to discuss) and no need for copy()?
something like:
if target_channel is not None:
target_range_da = ds_Sv["echo_range"].sel(channel=target_channel)
else:
target_range_da = target_grid
There was a problem hiding this comment.
The reason for trimming was to originally save memory and have a visual way to notice that the regridding initially worked by printing the dimension size. However, the helper function masks each nan ping to only take into account the pings. I am going to test the output without the trimming to make sure everything still works, and get back to you.
I agree that copy does not contribute anything. I will delete them.
There was a problem hiding this comment.
After testing the regrid without any trimming everything still works and regrid simply. @leewujung and @LOCEANlloydizard which of these approaches do you think better reflects echopype's current functions.
|
Hey @eliascapriles-NOAA, thanks for the modifications! I’ve left a few comments above, and also added some broader thoughts below to check first =>
we now returns a new xarray object which is nice, but we might want to align it more with other regridding like compute_MVBS? -propagate key variables: -add provenance: -add variable-level attrs on Sv:
does not seem necessary here since nothing is mutated afterwards? (but could be missing the obvious!)
It's the final adjusments! Thank you very much for this! |
Implements regrid_all_channel function that would allow for users to match the sampling rates of a channels in a ds_Sv dataset to a specific channel