Skip to content

QC data filtering and non-standard product loading#79

Merged
thomasteisberg merged 16 commits into
mainfrom
thomas/qc_and_extras
Apr 24, 2026
Merged

QC data filtering and non-standard product loading#79
thomasteisberg merged 16 commits into
mainfrom
thomas/qc_and_extras

Conversation

@thomasteisberg
Copy link
Copy Markdown
Member

This PR is intended to address two feature requests:

  1. Add xopr QC module after loading echograms and picks #76 - Adding an easy way to filter data by various "QC" criterion such as minimum ice thickness
  2. xopr compatability with CSARP_qlook_coh data product #75 - Adding the ability to load data products that are publicly available from the OPR servers but are not (yet) indexed by the STAC catalog

This is a work in progress, but I wanted to share something so that we could get feedback from everyone on it.

The best thing is to look at the two new docs notebooks added. They are called qc_demo.ipynb and nonstandard_data_products.ipynb and should be available in the rendered docs preview shortly.

A few specific questions:

  1. What should the "QC" module be called? @espg points out that "QC" doesn't feel quite right because the data that's being filtered out isn't necessarily "bad" data, it just isn't suitable for whatever the users' current purpose. "Filtering" is too overloaded in this context, though, so I also don't want to call it that.
  2. What do you (any of you) think of the runner approach for the QC checks? Is this the right balance on modularity and making it easy on the user?
  3. Some checks may require more data loaded than others. Right now, we force the full dataset to be loaded before running any QC checks, but you could imagine allowing some checks to run with only layers loaded and potentially avoiding loading the radar data at all if a threshold of "good" data was not met.

And the most important question: For @elizadawson, does this meet the needs of your workflow?

@thomasteisberg thomasteisberg added enhancement New feature or request preview-docs labels Mar 10, 2026
@thomasteisberg thomasteisberg marked this pull request as draft March 10, 2026 01:51
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 10, 2026

📚 Documentation Preview Ready!

🌐 Live Preview

URL: https://xopr-pr-79.surge.sh/xopr/

Preview updates automatically with new commits while the preview-docs label is present.

📦 Download Artifact

You can also download the docs for local viewing.

Commit: b4bea15


To trigger a preview, add the preview-docs label or use the /preview command.

@thomasteisberg
Copy link
Copy Markdown
Member Author

Direct links to relevant notebooks to look at:

@elizadawson
Copy link
Copy Markdown

elizadawson commented Mar 10, 2026

The qc_heading_change doesn't seem to be working correctly (at least not for 2008_Antarctica_ BaslerJKB). Heading is in radians (and is for a lot of seasons).... might be the problem

Edit: this actually looks like it's coming from a problem with the heading field in a lot of the data

@thomasteisberg
Copy link
Copy Markdown
Member Author

@elizadawson I added a notebook (https://xopr-pr-79.surge.sh/xopr/repicking/) showing a couple of ways of doing surface and bed repicking. After looking at this a bit, I'm not sure we need to actually add anything to xOPR. I think it can be expressed pretty cleanly with just a helper function expressing the logic you want (local_max and find_peaks_local as examples in the notebook) and a call to apply_ufunc. Let us know if this matches your use case or not.

@thomasteisberg
Copy link
Copy Markdown
Member Author

I rebased this off main since there have been a lot of changes there.

@elizadawson How has the QC workflow been going for you? Should we get this merged into main?

@elizadawson
Copy link
Copy Markdown

Yes, merging this branch into main would be great!

@thomasteisberg thomasteisberg requested a review from espg April 22, 2026 23:10
@espg espg marked this pull request as ready for review April 23, 2026 21:29
@espg
Copy link
Copy Markdown
Collaborator

espg commented Apr 23, 2026

/preview

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

✅ Documentation Preview Ready!

🌐 Live Preview

URL: https://xopr-pr-79.surge.sh/xopr/

Commit: db41c4f


Preview deployed via /preview command by @espg

Copy link
Copy Markdown
Collaborator

@espg espg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good-- some minor comments / suggestions on a few lines and notebook outputs.

Out of scope for this PR, but the repicking notebook could be made cooler if we were to fold in the tool we did for Michael as a way to manually repick the bed...

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The :::{info} block directive isn't valid and doesn't render properly -- we need either:

tip, warning, important, caution, danger, error, hint, attention, seealso, or the generic admonition (with a user-supplied title). Guessing either {note} or {tip}...

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/xopr/qc/checks.py
noise_floor = np.nanmean(data[:, -noise_region_samples:], axis=1)

with np.errstate(divide="ignore", invalid="ignore"):
snr_db = 10.0 * np.log10(bed_power / noise_floor)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude seems to think that we're calculating amplitude here rather than bed power-- no idea if this is correct, but worth double checking...

snr_bed_pick dB scaling (line 225): data is np.abs(Data) → amplitude, not power. For amplitude the convention is 20·log10(...); for power it's 10·log10(...). Current code labels the intermediate bed_power but feeds amplitude through the 10× formula, so reported SNRs are half of what users probably expect. Worth reconciling (either square first, or change the factor).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data is an "Nt by Nx single or double array containing the radar echogram in linear power units" (Echogram File Guide), so I think Claude is wrong here. It's assuming that Data is amplitude but it's actually power.

Comment thread src/xopr/qc/checks.py Outdated

n_traces = data.shape[0]
bed_power = np.full(n_traces, np.nan)
for i in range(n_traces):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be vectorized:

idx = np.clip(np.searchsorted(twtt, bottom_twtt), 0, data.shape[1] - 1)
bed_power = np.where(np.isnan(bottom_twtt), np.nan, data[np.arange(n_traces), idx])

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Fixed

@thomasteisberg thomasteisberg changed the title [WIP] QC data filtering and non-standard product loading QC data filtering and non-standard product loading Apr 24, 2026
@thomasteisberg thomasteisberg merged commit 1ca8887 into main Apr 24, 2026
9 checks passed
@thomasteisberg thomasteisberg deleted the thomas/qc_and_extras branch April 24, 2026 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request preview-docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants