feat(parser): implement relational mapping and optimize GeoJSON filtering#31
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughGeoJSONParser separates non-geometry definitions from geometry-bearing annotations, adds safe JSON parsing, introduces solve_relations(join_key) to merge definition properties into matching annotations (removing definitions), and updates get_filtered_geodataframe to return a GeoDataFrame; tests for relation resolution were added. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @vojtech-cifka, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces relational mapping and optimizes GeoJSON filtering. The changes enhance the GeoJSONParser by allowing metadata merging from definition features to spatial annotations and improving memory usage during filtering. The code looks good, with clear intentions and appropriate use of geopandas and pandas functionalities. The is_json_dict function has been updated to handle both string and dict inputs, which is a good improvement for robustness.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ratiopath/parsers/geojson_parser.py`:
- Around line 131-156: In solve_relations, dropna(axis=1, how="all") can remove
the join_key from definitions and cause a KeyError on merge; ensure the join_key
is preserved (or return early) before dropping empty columns by checking and
keeping join_key in definitions (or returning if join_key is entirely null) so
annotations.merge(definitions, on=join_key, ...) always has the join column;
update references to self.gdf, definitions, annotations, and merged_df
accordingly.
There was a problem hiding this comment.
Pull request overview
This PR adds relational metadata mapping capabilities to the GeoJSONParser and optimizes its geometry handling. The changes enable merging metadata from geometry-less definition features into spatial annotation features via a join key, while also improving the handling of MultiPolygon/MultiPoint geometries through decomposition.
Changes:
- Added
solve_relationsmethod to merge properties from definition features (geometry: null) into annotation features via a join key - Modified
__init__to separate definitions from annotations and selectively explode only geometries with theindex_parts=Trueparameter - Updated
get_filtered_geodataframeto handle dict objects directly (not just JSON strings) and added guards against empty DataFrame column loss
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
ratiopath/parsers/geojson_parser.py (1)
146-148:dropna(axis=1, how="all")is guarded — but consider protectingjoin_keyexplicitly for clarity.A prior review flagged that
dropna(axis=1, how="all")could removejoin_key. The early-return guard at line 139 (definitions[join_key].isna().all()) prevents this in practice. The concern is resolved, but an explicit exclusion ofjoin_keyfromdropnawould make the invariant self-documenting and robust against future refactors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ratiopath/parsers/geojson_parser.py` around lines 146 - 148, The current dropna call on definitions may accidentally remove the join_key column during future changes; explicitly exclude join_key when dropping all-NaN columns instead of relying on the earlier guard. Update the logic around definitions.drop(...).dropna(...) to compute or select the subset of columns that excludes join_key and apply dropna only to that subset (keep definitions[join_key] intact), then reassemble or assign back to definitions; reference the definitions variable and join_key identifier and the existing drop(columns=["geometry"], errors="ignore") call to locate where to make this change.
🧹 Nitpick comments (1)
tests/test_parsers.py (1)
201-210: Missing join key test is correct.When the join key is absent,
solve_relationsstrips definitions and returns only geometry rows. Checking"category_def" not in parser.gdf.columnscorrectly verifies no merge occurred.One minor note: you may also want to assert that the raw
"category"column is either absent or all-NaN in the result, to fully confirm no leakage from definition rows. This is optional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_parsers.py` around lines 201 - 210, The test test_solve_relations_missing_join_key should also assert that no leftover data from definition rows remains: after calling GeoJSONParser.solve_relations(join_key="invalid_key") add an assertion on parser.gdf that the raw "category" column is either not present or contains only NaN values (e.g., check "category" not in parser.gdf.columns or parser.gdf["category"].isna().all()), while keeping the existing checks on row count, geometry presence, and absence of "category_def".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ratiopath/parsers/geojson_parser.py`:
- Around line 37-51: When reconstructing self.gdf you drop its CRS; capture the
original crs (crs = self.gdf.crs) before you manipulate/explode annotations and
then pass that crs into the new GeoDataFrame construction (gpd.GeoDataFrame(...,
geometry="geometry", crs=crs)) so self.gdf.crs is preserved (this fixes the None
CRS that propagates into solve_relations).
---
Duplicate comments:
In `@ratiopath/parsers/geojson_parser.py`:
- Around line 146-148: The current dropna call on definitions may accidentally
remove the join_key column during future changes; explicitly exclude join_key
when dropping all-NaN columns instead of relying on the earlier guard. Update
the logic around definitions.drop(...).dropna(...) to compute or select the
subset of columns that excludes join_key and apply dropna only to that subset
(keep definitions[join_key] intact), then reassemble or assign back to
definitions; reference the definitions variable and join_key identifier and the
existing drop(columns=["geometry"], errors="ignore") call to locate where to
make this change.
---
Nitpick comments:
In `@tests/test_parsers.py`:
- Around line 201-210: The test test_solve_relations_missing_join_key should
also assert that no leftover data from definition rows remains: after calling
GeoJSONParser.solve_relations(join_key="invalid_key") add an assertion on
parser.gdf that the raw "category" column is either not present or contains only
NaN values (e.g., check "category" not in parser.gdf.columns or
parser.gdf["category"].isna().all()), while keeping the existing checks on row
count, geometry presence, and absence of "category_def".
There was a problem hiding this comment.
♻️ Duplicate comments (1)
ratiopath/parsers/geojson_parser.py (1)
38-52:⚠️ Potential issue | 🟠 MajorPreserve CRS when rebuilding the GeoDataFrame.
Reconstructingself.gdfwithoutcrsdrops spatial reference info, which can break downstream geospatial operations and makesself.gdf.crsNone.🛠️ Proposed fix
- # Isolate definitions (no geometry) from physical annotations + # Isolate definitions (no geometry) from physical annotations + original_crs = self.gdf.crs has_null_geometry = self.gdf.geometry.isna() | self.gdf.geometry.is_empty definitions = self.gdf[has_null_geometry] annotations = self.gdf[~has_null_geometry] @@ self.gdf = gpd.GeoDataFrame( pd.concat([annotations, definitions], ignore_index=True), geometry="geometry", + crs=original_crs, )#!/bin/bash # Verify geopandas version pinning and locations of dependency metadata rg -n "geopandas" -g "requirements*.txt" -g "pyproject.toml" -g "setup.cfg" -g "setup.py"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ratiopath/parsers/geojson_parser.py` around lines 38 - 52, Rebuilding self.gdf with gpd.GeoDataFrame(pd.concat([...]), geometry="geometry") drops the CRS; preserve spatial reference by carrying over the original CRS when reconstructing. Update the block that creates the new GeoDataFrame (around variables self.gdf, annotations, definitions and the explode call) to pass the original CRS (self.gdf.crs) to the new GeoDataFrame so the resulting frame retains the same CRS as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@ratiopath/parsers/geojson_parser.py`:
- Around line 38-52: Rebuilding self.gdf with gpd.GeoDataFrame(pd.concat([...]),
geometry="geometry") drops the CRS; preserve spatial reference by carrying over
the original CRS when reconstructing. Update the block that creates the new
GeoDataFrame (around variables self.gdf, annotations, definitions and the
explode call) to pass the original CRS (self.gdf.crs) to the new GeoDataFrame so
the resulting frame retains the same CRS as before.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
ratiopath/parsers/geojson_parser.py (1)
163-170: Consider simplifying the redundant type hint.The type hint
str | Anyis redundant sinceAnyalready encompasses all types includingstr. This could be simplified to justx: Any, though the current form does communicate the expected primary input type.✏️ Suggested simplification
-def safe_to_dict(x: str | Any) -> Any: +def safe_to_dict(x: Any) -> Any:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ratiopath/parsers/geojson_parser.py` around lines 163 - 170, The function signature for safe_to_dict currently uses the redundant type hint `x: str | Any`; change it to a single `x: Any` parameter (and adjust the return type if desired) so the hint is not redundant, leaving the body logic unchanged (the function still checks `isinstance(x, str)` and uses `json.loads` with the existing `except (json.JSONDecodeError, TypeError)` behavior).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@ratiopath/parsers/geojson_parser.py`:
- Around line 163-170: The function signature for safe_to_dict currently uses
the redundant type hint `x: str | Any`; change it to a single `x: Any` parameter
(and adjust the return type if desired) so the hint is not redundant, leaving
the body logic unchanged (the function still checks `isinstance(x, str)` and
uses `json.loads` with the existing `except (json.JSONDecodeError, TypeError)`
behavior).
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
ratiopath/parsers/geojson_parser.pytests/test_parsers.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_parsers.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_parsers.py`:
- Around line 216-218: Replace the fragile assertion that checks for
"category_def" with merge-specific assertions on parser.gdf: verify there are no
columns ending with the merge suffixes (e.g., "_orig" and "_def") after the
missing-join-key path; specifically, in the test using parser.gdf, assert that
no column name matches the patterns r".*_orig$" or r".*_def$" (and keep the
existing length and geometry checks). This ensures the test looks for absence of
suffix-based merged columns rather than a single column name.
I agree with @matejpekar, and I think |
Goal
Enable relational mapping between spatial and definition entities in
GeoJSONParserand optimize memory usage during filtering.Changes
solve_relations): Merge metadata from definition features (geometry: null) to spatial annotations viajoin_keyannotations.explode(index_parts=True)Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Tests
Documentation