Bedmap2 and OPR fuzzy line matching script by weiji14 · Pull Request #72 · englacial/xopr

weiji14 · 2026-02-10T20:07:26Z

What I am changing

Find closely matching Bedmap2 and OPR radar lines on a segment by segment basis

How I did it

~~Using hausdorff distance function (geopandas.Series.hausdorff_distance) as basis for fuzzy matching close enough linestrings~~
Edit: Doing a more exact match of dense BedMAP2 and OPR points now using more custom logic

TODO:

Initial implementation
Use non-simplified Bedmap linestrings for more exact matching
Set threshold to determine whether a Bedmap linestring matches with an OPR linestring
Report Bedmap IDs which overlap or are duplicates of the OPR lines
Create a utility function to skip reading Bedmap IDs that are known to be duplicates?

How you can test it

See scripts/bedmap_overlap_opr.py

Related Issues

#69

Fun fact: I used this hausdorff distance metric in a previous GIS job (10 years ago) for matching road segments when new data would come in from city councils, and we need to find where the new roads were compared to the old ones 😃 superseded

Find closely matching Bedmap2 and OPR radar lines on a segment by segment basis. Uses hausdorff distance as the fuzzy matching metric. Intended outputs are the Bedmap IDs that have a low hausdorff distance (based on some threshold) to the OPR line segments, which indicate a close enough fuzzy match.

weiji14 · 2026-02-10T20:15:31Z

+    # Report Bedmap IDs and their hausdorff distance to nearest OPR line segment
+    print(gdf_bedmap_[["id", "hausdorff_dist"]].sort_values(by="hausdorff_dist"))


Example output from year 2002, sorted by ascending hausdorff distance:

id hausdorff_dist 52 Data_20021126_01_002 3.026301e+02 50 Data_20021126_01_001 1.088522e+03 37 Data_20021210_01_003 1.501179e+04 36 Data_20021210_01_002 1.588442e+04 48 Data_20021210_01_014 2.650424e+04 .. ... ... 4 Data_20021212_01_005 1.499097e+06 54 Data_20021126_01_004 1.619117e+06 15 Data_20021206_01_001 1.772265e+06 44 Data_20021210_01_010 2.720349e+06 46 Data_20021210_01_012 2.828567e+06 [64 rows x 2 columns]

We'll need to set a threshold somewhere to determine whether there is a overlap/match or not. Best if the bedmap2 linestrings are not simplified to enable setting a low threshold (<100km or so?).

Accidentally got confused when naming the bedmap catalog opr and vice versa, so renaming them to be correct.

Convenience function to read parquet files from remote object storage using obstore and geopandas.

Replace the previous hausdorff distance based fuzzy matching algorithm with a more exact method that relies on access to the dense BedMAP XY points. The algorithm works by looping through each OPR line segment, and finds the corresponding series of points in the BedMAP database using some distance tolerance. There are quite a few hardcoded heuristics/assumptions, so need to check more thoroughly on all years.

Refine algorithm to work on years besides 2002 that are less clean. Need to sort the BedMap points by timestamp besides the OPR lines now, and we can't assume trajectory_id is unique so just use the row index. Added an extra check to skip trying a different tolerance when most points are already too far (>200m) away from the line segment. Also skipping cases where there is only one point matching, should be at least 2 points to form a line.

Do a fast check of the dense BedMap points against sparse OPR points first, and if they match great! If not, download the dense OPR points (in .mat format) from CReSIS and redo the distance-based matching.

Helps to label some cases where flight lines are parallel and quite close to each other spatially, and happens in close enough sequence ids.

Running backwards in time from 2019 to 2002 now, starting with the BedMap3 archive.

Replacing the slow exact distance match (which requires pulling in the CReSIS .mat files) with a temporal match instead that is a bit faster. Still missing lots of cases where initial distance match didn't work (because sparse OPR lines are too far from the dense BedMap points). Partially reverts 6c4e57e

Use row-based delta in terms of distance or time to refine head and tail indexes for matching OPR segments to BedMAP points. Specifically, using pd.Series.pct_change for spatial, and pd.Series.diff for temporal large diff finding between subsequent rows. Still need to fix some off-by-one errors though.

Somewhat reverting 7dd38a9, because temporal method is not so good for the super tricky cases where flightlines may be a day or so apart. Now using h5py instead of scipy.io to read the .mat files. Changed some of the tolerance values and variable names to fit recent code changes.

Catch edge case when df_dist_delta > 200 returns no results, causing an IndexError in the head/tail shift logic. Let things jump straight to the slow dense point checking fallback.

Mostly needed for the 2016 campaign where some segments are trying to match with 2 million + points, making the distance calculations super slow. Using a +/-2 day temporal filter as in 7dd38a9 but not applying the inflexion point method from 24aa9c3.

Manual match-case to match OPR collections to the relevant BedMAP campaign.

weiji14 · 2026-04-20T23:16:38Z

+    with tempfile.TemporaryDirectory() as tmpdir:
+        mat_fpath = os.path.basename(p := url)  # e.g. Data_20101026_01_001.mat
+        file_name = os.path.join(tmpdir, mat_fpath)
+        urllib.request.urlretrieve(url=p, filename=file_name)  # download to tempfile
+        dat = h5py.File(name=file_name)


Hitting into some OSError: Unable to synchronously open file (file signature not found) errors when trying to open some of the *.mat files using h5py (e.g. https://data.cresis.ku.edu/data/rds/2013_Antarctica_Basler/CSARP_standard/20131219_02/Data_20131219_02_005.mat). ~~Might be some corrupted files on the CReSIS servers?~~

Ah ok, so that file is a MATLAB 5.0 MAT-file, which h5py can't read because it is not a HDF5 file (Only MATLAB 7.3 and above are HDF5 according to https://www.mathworks.com/help/matlab/import_export/mat-file-versions.html). Need to use scipy.io.loadmat instead as a fallback.

Fixed in 7c12268

Older CReSIS files (<=2014?) were written as MATLAB 5, while newer ones are MATLAB 7, so need to use either scipy.io.loadmat or h5py.File to read the arrays. Combine 6c4e57e and f6bf08f.

The 2009_Antarctica_TO_Gambit campaign over the Gamburtsev Mountains doesn't overlap with the BedMap3 lines, so removing. 2009_Antartica_TO over Thwaites and Pine Island Glacier overlaps with both CRESIS_2009_AntarcticaTO_AIR_BM3 & CRESIS_2009_Thwaites_AIR_BM3 though.

espg · 2026-04-22T04:59:14Z

FYI, all the xopr stac catalogs have the mbox polygon coverage now (if that's helpful)-- also, we've merged #73 and updated to 0.5.0 >.>

Fix `SyntaxError: 'await' outside function` by properly wrapping things in asyncio.run. Also added some extra print statements to indicate main sections of the script.

Timestamps are only valid for BedMap3 and two BedMap2 seasons (2011/2012). The other BedMap2 seasons (2002, 2004, 2009, 2010) use interpolated timestamps so can't use the temporal filter reliably.

Tsutaki et al. 2022, https://doi.org/10.5194/tc-16-2967-2022. Unsure why this Japanese expedition data is hosted under CReSIS, but gotta handle it I suppose.

codecov · 2026-04-29T22:33:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Calculate percentage and count of classified (Bedmap matches OPR) vs unclassified (no matches) points in the Geopackage files.

weiji14 mentioned this pull request Feb 10, 2026

Determine overlap between Bedmap and OPR data #69

Open

weiji14 commented Feb 10, 2026

View reviewed changes

weiji14 added 3 commits February 12, 2026 08:09

Fix Bedmap and OPR variable switchup

822a2d7

Accidentally got confused when naming the bedmap catalog opr and vice versa, so renaming them to be correct.

Create aio_read_parquet function

9a261ed

Convenience function to read parquet files from remote object storage using obstore and geopandas.

weiji14 changed the title ~~Bedmap2 and OPR fuzzy line matching script using hausdorff distance~~ Bedmap2 and OPR fuzzy line matching script Feb 18, 2026

weiji14 added 5 commits February 18, 2026 21:28

New logic to check against actual dense OPR lines

6c4e57e

Do a fast check of the dense BedMap points against sparse OPR points first, and if they match great! If not, download the dense OPR points (in .mat format) from CReSIS and redo the distance-based matching.

Do one more nested attempt checking against dense BedMap/OPR points

40bb503

Helps to label some cases where flight lines are parallel and quite close to each other spatially, and happens in close enough sequence ids.

Run on BedMap3 and Bedmap2

a791eb5

Running backwards in time from 2019 to 2002 now, starting with the BedMap3 archive.

espg mentioned this pull request Mar 9, 2026

Reprocess STAC catalogs with morton indices (mbox/mpolygon) #78

Closed

57 tasks

weiji14 added 4 commits March 19, 2026 15:42

Merge branch 'main' into bedmap_overlap_opr

6703c21

Handle case when no consecutive points with distance diff > 200m

de5a417

Catch edge case when df_dist_delta > 200 returns no results, causing an IndexError in the head/tail shift logic. Let things jump straight to the slow dense point checking fallback.

weiji14 force-pushed the bedmap_overlap_opr branch from 0c24633 to de5a417 Compare March 27, 2026 04:32

espg mentioned this pull request Apr 20, 2026

STAC catalog: parquet schema fixes, morton indexing, spatial matching, and full season YAML coverage #73

Merged

8 tasks

Handle multiple campaigns in year 2013 and 2009

65b4ebf

Manual match-case to match OPR collections to the relevant BedMAP campaign.

weiji14 commented Apr 20, 2026

View reviewed changes

weiji14 added 2 commits April 21, 2026 12:20

Use scipy.io.loadmat for MATLAB 5 files, h5py.File for MATLAB 7.3+ files

7c12268

Older CReSIS files (<=2014?) were written as MATLAB 5, while newer ones are MATLAB 7, so need to use either scipy.io.loadmat or h5py.File to read the arrays. Combine 6c4e57e and f6bf08f.

Make script runnable by putting await in async def

7511e88

Fix `SyntaxError: 'await' outside function` by properly wrapping things in asyncio.run. Also added some extra print statements to indicate main sections of the script.

weiji14 force-pushed the bedmap_overlap_opr branch from 005df6b to 7511e88 Compare April 27, 2026 08:55

weiji14 added 3 commits April 27, 2026 21:16

Merge branch 'main' into bedmap_overlap_opr

50aae29

Only apply temporal filter for BM3 or 2011/2012 BM2 seasons

d0ea18a

Timestamps are only valid for BedMap3 and two BedMap2 seasons (2011/2012). The other BedMap2 seasons (2002, 2004, 2009, 2010) use interpolated timestamps so can't use the temporal filter reliably.

Handle 2018_Antarctica_Ground season around Dome Fuji

a9aae60

Tsutaki et al. 2022, https://doi.org/10.5194/tc-16-2967-2022. Unsure why this Japanese expedition data is hosted under CReSIS, but gotta handle it I suppose.

Report overlap statistics for each BedMap campaign matched against OPR

8bcb984

Calculate percentage and count of classified (Bedmap matches OPR) vs unclassified (no matches) points in the Geopackage files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bedmap2 and OPR fuzzy line matching script#72

Bedmap2 and OPR fuzzy line matching script#72
weiji14 wants to merge 22 commits into
englacial:mainfrom
weiji14:bedmap_overlap_opr

weiji14 commented Feb 10, 2026 •

edited

Loading

Uh oh!

weiji14 Feb 10, 2026 •

edited

Loading

Uh oh!

weiji14 Apr 20, 2026 •

edited

Loading

Uh oh!

weiji14 Apr 20, 2026

Uh oh!

weiji14 Apr 21, 2026

Uh oh!

espg commented Apr 22, 2026

Uh oh!

codecov Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Report Bedmap IDs and their hausdorff distance to nearest OPR line segment
		print(gdf_bedmap_[["id", "hausdorff_dist"]].sort_values(by="hausdorff_dist"))

Conversation

weiji14 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What I am changing

How I did it

How you can test it

Related Issues

Uh oh!

weiji14 Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weiji14 Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

weiji14 Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

weiji14 Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

espg commented Apr 22, 2026

Uh oh!

codecov Bot commented Apr 29, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

weiji14 commented Feb 10, 2026 •

edited

Loading

weiji14 Feb 10, 2026 •

edited

Loading

weiji14 Apr 20, 2026 •

edited

Loading