Skip to content

Ch/ex dqry postprocessing#49

Merged
cleherny merged 27 commits into
masterfrom
ch/ex-dqry_postprocessing
Sep 30, 2025
Merged

Ch/ex dqry postprocessing#49
cleherny merged 27 commits into
masterfrom
ch/ex-dqry_postprocessing

Conversation

@cleherny
Copy link
Copy Markdown
Contributor

Update of the mineral extraction site example with improved merging and filtering approach, following the one of proj-dqry.

@cleherny cleherny self-assigned this Jul 18, 2025
@stdl-s-sonarqube

This comment has been minimized.

Copy link
Copy Markdown
Member

@GwenaelleSa GwenaelleSa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the files except the new filter, because it was close to the one on the main project DQRY, so the same comments apply.
I don't agree with your addition to the data preparation, because I feel like you copy-pasted some code from elsewhere without taking into account the latest modification. I did a new branch with a PR on this branch: #51. It rollbacks some of your modifications and factorize the post-processing more.
I kept your new function merge_polygons and the post-processing in merge_adjacent_detections is a mix of your code and of the comments I left on this PR. I kept the previous version of the year control, because assigning random year to empty tiles seemed like a risky business and because I was lazy.

Comment thread examples/mineral-extract-sites-detection/filter_detections.py
Comment thread examples/mineral-extract-sites-detection/prepare_data.py Outdated
Comment thread examples/mineral-extract-sites-detection/prepare_data.py Outdated
Comment thread examples/mineral-extract-sites-detection/prepare_data.py Outdated
Comment on lines +179 to +242
# Prepare the tiles

## Convert datasets shapefiles into geojson format
logger.info('Convert the label shapefiles into GeoJSON format (EPSG:4326)...')
labels_4326_gdf, written_files = prepare_labels(SHPFILE, written_files)
gt_labels_4326_gdf = labels_4326_gdf[['geometry', 'CATEGORY', 'SUPERCATEGORY']].copy()

# Add FP labels if it exists
if FP_SHPFILE:
logger.info('Convert the FP label shapefiles into GeoJSON format (EPSG:4326)...')
fp_labels_4326_gdf, written_files = prepare_labels(FP_SHPFILE, written_files, prefix='FP_')
labels_4326_gdf = pd.concat([labels_4326_gdf, fp_labels_4326_gdf], ignore_index=True)

# Tiling of the AoI
logger.info("- Get the label boundaries")
boundaries_df = labels_4326_gdf.bounds
logger.info("- Tiling of the AoI")
tiles_4326_aoi_gdf = aoi_tiling(boundaries_df)
tiles_4326_labels_gdf = gpd.sjoin(tiles_4326_aoi_gdf, labels_4326_gdf, how='inner', predicate='intersects')

# Tiling of the AoI from which empty tiles will be selected
if EPT_SHPFILE:
EPT_aoi_gdf = gpd.read_file(EPT_SHPFILE)
EPT_aoi_4326_gdf = EPT_aoi_gdf.to_crs(epsg=4326)
assert_year(labels_4326_gdf, EPT_aoi_4326_gdf, 'empty_tiles', EPT_YEAR)

if EPT_TYPE == 'aoi':
logger.info("- Get AoI boundaries")
EPT_aoi_boundaries_df = EPT_aoi_4326_gdf.bounds

# Get tile coordinates and shapes
logger.info("- Tiling of the empty tiles AoI")
empty_tiles_4326_all_gdf = aoi_tiling(EPT_aoi_boundaries_df)
# Delete tiles outside of the AoI limits
empty_tiles_4326_aoi_gdf = gpd.sjoin(empty_tiles_4326_all_gdf, EPT_aoi_4326_gdf, how='inner', lsuffix='ept_tiles', rsuffix='ept_aoi')
# Attribute a year to empty tiles if necessary
if 'year' in labels_4326_gdf.keys():
if isinstance(EPT_YEAR, int):
empty_tiles_4326_aoi_gdf['year'] = int(EPT_YEAR)
else:
empty_tiles_4326_aoi_gdf['year'] = np.random.randint(low=EPT_YEAR[0], high=EPT_YEAR[1], size=(len(empty_tiles_4326_aoi_gdf)))
elif EPT_TYPE == 'shp':
if EPT_YEAR:
logger.warning("A shapefile of selected empty tiles are provided. The year set for the empty tiles in the configuration file will be ignored")
EPT_YEAR = None
empty_tiles_4326_aoi_gdf = EPT_aoi_4326_gdf.copy()

# Get all the tiles in one gdf
logger.info("- Concatenate label tiles and empty AoI tiles")
tiles_4326_all_gdf = pd.concat([tiles_4326_labels_gdf, empty_tiles_4326_aoi_gdf])
else:
tiles_4326_all_gdf = tiles_4326_labels_gdf.copy()

# - Remove useless columns, reset feature id and redefine it according to xyz format
logger.info('- Add tile IDs and reorganise the data set')
tiles_4326_all_gdf = tiles_4326_all_gdf[['geometry', 'title', 'year'] if 'year' in tiles_4326_all_gdf.keys() else ['geometry', 'title']].copy()
tiles_4326_all_gdf.reset_index(drop=True, inplace=True)
tiles_4326_all_gdf = tiles_4326_all_gdf.apply(add_tile_id, axis=1)

# - Remove duplicated tiles
tiles_4326_all_gdf.drop_duplicates(['id'], inplace=True)

nb_tiles = len(tiles_4326_all_gdf)
logger.info(f"There were {nb_tiles} tiles created")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sur to what I have to look at here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread helpers/metrics.py
detections_merge_gdf['geometry'] = detections_merge_gdf.geometry.buffer(-1, join_style='mitre')

# Merge adjacent polygons within the provided thd distance
detections_merge_gdf['geometry'] = detections_merge_gdf.geometry.buffer(DISTANCE, join_style='mitre')
Copy link
Copy Markdown
Member

@GwenaelleSa GwenaelleSa Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detections are already buffered based on ln. 104, except with 1 instead of DISTANCE. The geometry passed down is the buffered one.
Is it really what you want? Wouldn't it be simpler to always buffer with DISTANCE?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked, and indeed, it didn't change the input geometry much.
image
I had to dig a bit to remember why I did that... I think it's because I still want the adjacent tiles to be merged even if the distance threshold is below the distance to needed to merge adjacent polygons (which is unlikely to happen in this case) or if the value is set to 0 but may be we can add a condition for this case.

detections_overlap_tiles_gdf = misc.merge_polygons(detections_overlap_tiles_gdf)

# Concat polygons contained within a tile and the merged ones
detections_merge_gdf = pd.concat([detections_overlap_tiles_gdf, detections_within_tiles_gdf], axis=0, ignore_index=True)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since detections_within_tiles_gdf are within tiles, you don't need to include them in all the stuff about merging detections and getting the right attributes after. You could just use the initial unbuffered geometry and add them at the end (ln 156).

Comment thread examples/mineral-extract-sites-detection/prepare_data.py Outdated
Comment thread examples/mineral-extract-sites-detection/config_det.yaml
Copy link
Copy Markdown
Contributor Author

@cleherny cleherny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Gwenaëlle for your review. I've answered to your comments and wait for #51 to make the last changes.

detections_merge_gdf['geometry'] = detections_merge_gdf.geometry.buffer(-1, join_style='mitre')

# Merge adjacent polygons within the provided thd distance
detections_merge_gdf['geometry'] = detections_merge_gdf.geometry.buffer(DISTANCE, join_style='mitre')
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked, and indeed, it didn't change the input geometry much.
image
I had to dig a bit to remember why I did that... I think it's because I still want the adjacent tiles to be merged even if the distance threshold is below the distance to needed to merge adjacent polygons (which is unlikely to happen in this case) or if the value is set to 0 but may be we can add a condition for this case.

Comment thread examples/mineral-extract-sites-detection/prepare_data.py Outdated
Comment thread examples/mineral-extract-sites-detection/prepare_data.py Outdated
Comment on lines +179 to +242
# Prepare the tiles

## Convert datasets shapefiles into geojson format
logger.info('Convert the label shapefiles into GeoJSON format (EPSG:4326)...')
labels_4326_gdf, written_files = prepare_labels(SHPFILE, written_files)
gt_labels_4326_gdf = labels_4326_gdf[['geometry', 'CATEGORY', 'SUPERCATEGORY']].copy()

# Add FP labels if it exists
if FP_SHPFILE:
logger.info('Convert the FP label shapefiles into GeoJSON format (EPSG:4326)...')
fp_labels_4326_gdf, written_files = prepare_labels(FP_SHPFILE, written_files, prefix='FP_')
labels_4326_gdf = pd.concat([labels_4326_gdf, fp_labels_4326_gdf], ignore_index=True)

# Tiling of the AoI
logger.info("- Get the label boundaries")
boundaries_df = labels_4326_gdf.bounds
logger.info("- Tiling of the AoI")
tiles_4326_aoi_gdf = aoi_tiling(boundaries_df)
tiles_4326_labels_gdf = gpd.sjoin(tiles_4326_aoi_gdf, labels_4326_gdf, how='inner', predicate='intersects')

# Tiling of the AoI from which empty tiles will be selected
if EPT_SHPFILE:
EPT_aoi_gdf = gpd.read_file(EPT_SHPFILE)
EPT_aoi_4326_gdf = EPT_aoi_gdf.to_crs(epsg=4326)
assert_year(labels_4326_gdf, EPT_aoi_4326_gdf, 'empty_tiles', EPT_YEAR)

if EPT_TYPE == 'aoi':
logger.info("- Get AoI boundaries")
EPT_aoi_boundaries_df = EPT_aoi_4326_gdf.bounds

# Get tile coordinates and shapes
logger.info("- Tiling of the empty tiles AoI")
empty_tiles_4326_all_gdf = aoi_tiling(EPT_aoi_boundaries_df)
# Delete tiles outside of the AoI limits
empty_tiles_4326_aoi_gdf = gpd.sjoin(empty_tiles_4326_all_gdf, EPT_aoi_4326_gdf, how='inner', lsuffix='ept_tiles', rsuffix='ept_aoi')
# Attribute a year to empty tiles if necessary
if 'year' in labels_4326_gdf.keys():
if isinstance(EPT_YEAR, int):
empty_tiles_4326_aoi_gdf['year'] = int(EPT_YEAR)
else:
empty_tiles_4326_aoi_gdf['year'] = np.random.randint(low=EPT_YEAR[0], high=EPT_YEAR[1], size=(len(empty_tiles_4326_aoi_gdf)))
elif EPT_TYPE == 'shp':
if EPT_YEAR:
logger.warning("A shapefile of selected empty tiles are provided. The year set for the empty tiles in the configuration file will be ignored")
EPT_YEAR = None
empty_tiles_4326_aoi_gdf = EPT_aoi_4326_gdf.copy()

# Get all the tiles in one gdf
logger.info("- Concatenate label tiles and empty AoI tiles")
tiles_4326_all_gdf = pd.concat([tiles_4326_labels_gdf, empty_tiles_4326_aoi_gdf])
else:
tiles_4326_all_gdf = tiles_4326_labels_gdf.copy()

# - Remove useless columns, reset feature id and redefine it according to xyz format
logger.info('- Add tile IDs and reorganise the data set')
tiles_4326_all_gdf = tiles_4326_all_gdf[['geometry', 'title', 'year'] if 'year' in tiles_4326_all_gdf.keys() else ['geometry', 'title']].copy()
tiles_4326_all_gdf.reset_index(drop=True, inplace=True)
tiles_4326_all_gdf = tiles_4326_all_gdf.apply(add_tile_id, axis=1)

# - Remove duplicated tiles
tiles_4326_all_gdf.drop_duplicates(['id'], inplace=True)

nb_tiles = len(tiles_4326_all_gdf)
logger.info(f"There were {nb_tiles} tiles created")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sur to what I have to look at here.

Comment thread examples/mineral-extract-sites-detection/filter_detections.py
Comment thread examples/mineral-extract-sites-detection/config_det.yaml
Comment thread examples/mineral-extract-sites-detection/filter_detections.py
@stdl-s-sonarqube

This comment has been minimized.

@stdl-s-sonarqube

This comment has been minimized.

@stdl-s-sonarqube
Copy link
Copy Markdown

Failed

  • 4.72% Duplicated Lines (%) on New Code (is greater than 3.00%)

Analysis Details

2 Issues

  • Bug 0 Bugs
  • Vulnerability 0 Vulnerabilities
  • Code Smell 2 Code Smells

Coverage and Duplications

  • Coverage No coverage information (0.00% Estimated after merge)
  • Duplications 4.72% Duplicated Code (0.70% Estimated after merge)

Project ID: swiss-territorial-data-lab_object-detector_AYZ4zWIzr7JdaaSXwexc

View in SonarQube

@cleherny cleherny merged commit ed23ba7 into master Sep 30, 2025
1 check passed
@cleherny cleherny deleted the ch/ex-dqry_postprocessing branch September 30, 2025 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants