Skip to content

feat(parser): implement relational mapping and optimize GeoJSON filtering#31

Merged
Adames4 merged 29 commits intoRationAI:mainfrom
vojtech-cifka:feature/geojson-relational-parsing
Mar 4, 2026
Merged

feat(parser): implement relational mapping and optimize GeoJSON filtering#31
Adames4 merged 29 commits intoRationAI:mainfrom
vojtech-cifka:feature/geojson-relational-parsing

Conversation

@vojtech-cifka
Copy link
Contributor

@vojtech-cifka vojtech-cifka commented Feb 23, 2026

Goal

Enable relational mapping between spatial and definition entities in GeoJSONParser and optimize memory usage during filtering.

Changes

  • Relational mapping (solve_relations): Merge metadata from definition features (geometry: null) to spatial annotations via join_key
  • Geometry decomposition: Safely unpack MultiPolygon/MultiPoint entities using annotations.explode(index_parts=True)

Summary by CodeRabbit

  • New Features

    • Merge non-geometry property definitions into matching geometry annotations via a customizable join key (new relation resolution).
  • Bug Fixes

    • Safer handling of empty/null and multi-part geometries during filtering.
    • More robust nested-property filtering with JSON-safe parsing to avoid parsing errors and preserve CRS.
  • Refactor

    • Initialization reworked to separate definitions from annotations before processing.
  • Tests

    • Added tests covering successful merges and missing-join-key scenarios.
  • Documentation

    • Extended class documentation describing relational behavior and usage.

@vojtech-cifka vojtech-cifka requested a review from vejtek February 23, 2026 22:05
@vojtech-cifka vojtech-cifka self-assigned this Feb 23, 2026
@vojtech-cifka vojtech-cifka requested review from a team and Copilot February 23, 2026 22:05
@vojtech-cifka vojtech-cifka added the enhancement New feature or request label Feb 23, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

GeoJSONParser separates non-geometry definitions from geometry-bearing annotations, adds safe JSON parsing, introduces solve_relations(join_key) to merge definition properties into matching annotations (removing definitions), and updates get_filtered_geodataframe to return a GeoDataFrame; tests for relation resolution were added.

Changes

Cohort / File(s) Summary
Core parser & JSON handling
ratiopath/parsers/geojson_parser.py
Reworked initialization to separate null/empty-geometry definitions from annotation features and rebuild the GeoDataFrame; removed is_json_dict; added safe_to_dict for robust JSON-string parsing; adjusted filtering to use safe conversions; get_filtered_geodataframe() now returns -> GeoDataFrame.
Relational merge logic
ratiopath/parsers/geojson_parser.py
Added solve_relations(self, join_key: str) -> None to validate join key, join properties from non-geometry definition features into matching geometry-bearing annotations, remove merged definitions, detect duplicates/missing keys, and preserve CRS.
Tests
tests/test_parsers.py
Added geojson_with_relations_content fixture and tests test_solve_relations_successful_merge and test_solve_relations_missing_join_key to verify relation merging behavior; added pandas as pd import.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hopped through GeoJSON lands today,
I split the defs and shapes along the way,
I parsed the strings both safe and neat,
I stitched their props so rows could meet,
Now features hop together — hooray! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: implementing relational mapping (solve_relations method) and optimizing GeoJSON filtering with safe_to_dict approach.
Docstring Coverage ✅ Passed Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @vojtech-cifka, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the GeoJSONParser by introducing relational mapping capabilities, allowing for the integration of metadata from definition features into spatial annotations. It also refines the handling of multi-part geometries for better memory management and improves the robustness of data filtering, particularly when dealing with empty datasets or varied JSON structures. These changes collectively make the parser more versatile and efficient for complex GeoJSON data processing.

Highlights

  • Relational Mapping for GeoJSON: Implemented a new solve_relations method to merge metadata from geometry-less definition features into spatial annotation features using a specified join key. This enhances the parser's ability to handle complex GeoJSON structures where metadata is separated from spatial data.
  • Optimized Geometry Decomposition: Refactored the __init__ method to selectively explode multi-part geometries (like MultiPolygon/MultiPoint) only for actual spatial annotations, leaving geometry-less definition features untouched. This improves memory efficiency and data integrity.
  • Robust Filtering for Empty DataFrames: Enhanced the get_filtered_geodataframe method with early exit conditions when filtering results in an empty GeoDataFrame. This prevents unnecessary processing and potential errors, making the filtering process more efficient and stable.
  • Improved JSON Handling: Updated the is_json_dict utility function and the filtering logic to correctly handle cases where data might already be a dictionary or where JSON decoding errors occur, providing more resilient data processing.
Changelog
  • ratiopath/parsers/geojson_parser.py
    • Added Any and pandas imports for broader type hinting and data manipulation.
    • Updated the class docstring to detail new relational metadata integration capabilities and schema expectations.
    • Modified the __init__ method to separate geometry-less definitions from spatial annotations and apply geometry explosion only to annotations.
    • Introduced a new solve_relations method to merge properties from definition features into annotation features based on a join_key.
    • Adjusted get_filtered_geodataframe to return a gpd.GeoDataFrame with a geometry column when an empty frame is returned.
    • Implemented early return checks in get_filtered_geodataframe to handle empty filtered_gdf instances, preventing further processing on empty sets.
    • Refined JSON parsing within get_filtered_geodataframe to handle objects that are already dictionaries.
    • Updated the is_json_dict function to accept Any type, directly check for dictionary instances, and catch json.JSONDecodeError.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces relational mapping and optimizes GeoJSON filtering. The changes enhance the GeoJSONParser by allowing metadata merging from definition features to spatial annotations and improving memory usage during filtering. The code looks good, with clear intentions and appropriate use of geopandas and pandas functionalities. The is_json_dict function has been updated to handle both string and dict inputs, which is a good improvement for robustness.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ratiopath/parsers/geojson_parser.py`:
- Around line 131-156: In solve_relations, dropna(axis=1, how="all") can remove
the join_key from definitions and cause a KeyError on merge; ensure the join_key
is preserved (or return early) before dropping empty columns by checking and
keeping join_key in definitions (or returning if join_key is entirely null) so
annotations.merge(definitions, on=join_key, ...) always has the join column;
update references to self.gdf, definitions, annotations, and merged_df
accordingly.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 546f393 and 7170bd7.

📒 Files selected for processing (1)
  • ratiopath/parsers/geojson_parser.py

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds relational metadata mapping capabilities to the GeoJSONParser and optimizes its geometry handling. The changes enable merging metadata from geometry-less definition features into spatial annotation features via a join key, while also improving the handling of MultiPolygon/MultiPoint geometries through decomposition.

Changes:

  • Added solve_relations method to merge properties from definition features (geometry: null) into annotation features via a join key
  • Modified __init__ to separate definitions from annotations and selectively explode only geometries with the index_parts=True parameter
  • Updated get_filtered_geodataframe to handle dict objects directly (not just JSON strings) and added guards against empty DataFrame column loss

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@vejtek vejtek requested a review from matejpekar February 24, 2026 07:48
@Adames4 Adames4 self-requested a review February 24, 2026 09:13
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
ratiopath/parsers/geojson_parser.py (1)

146-148: dropna(axis=1, how="all") is guarded — but consider protecting join_key explicitly for clarity.

A prior review flagged that dropna(axis=1, how="all") could remove join_key. The early-return guard at line 139 (definitions[join_key].isna().all()) prevents this in practice. The concern is resolved, but an explicit exclusion of join_key from dropna would make the invariant self-documenting and robust against future refactors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/parsers/geojson_parser.py` around lines 146 - 148, The current
dropna call on definitions may accidentally remove the join_key column during
future changes; explicitly exclude join_key when dropping all-NaN columns
instead of relying on the earlier guard. Update the logic around
definitions.drop(...).dropna(...) to compute or select the subset of columns
that excludes join_key and apply dropna only to that subset (keep
definitions[join_key] intact), then reassemble or assign back to definitions;
reference the definitions variable and join_key identifier and the existing
drop(columns=["geometry"], errors="ignore") call to locate where to make this
change.
🧹 Nitpick comments (1)
tests/test_parsers.py (1)

201-210: Missing join key test is correct.

When the join key is absent, solve_relations strips definitions and returns only geometry rows. Checking "category_def" not in parser.gdf.columns correctly verifies no merge occurred.

One minor note: you may also want to assert that the raw "category" column is either absent or all-NaN in the result, to fully confirm no leakage from definition rows. This is optional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_parsers.py` around lines 201 - 210, The test
test_solve_relations_missing_join_key should also assert that no leftover data
from definition rows remains: after calling
GeoJSONParser.solve_relations(join_key="invalid_key") add an assertion on
parser.gdf that the raw "category" column is either not present or contains only
NaN values (e.g., check "category" not in parser.gdf.columns or
parser.gdf["category"].isna().all()), while keeping the existing checks on row
count, geometry presence, and absence of "category_def".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ratiopath/parsers/geojson_parser.py`:
- Around line 37-51: When reconstructing self.gdf you drop its CRS; capture the
original crs (crs = self.gdf.crs) before you manipulate/explode annotations and
then pass that crs into the new GeoDataFrame construction (gpd.GeoDataFrame(...,
geometry="geometry", crs=crs)) so self.gdf.crs is preserved (this fixes the None
CRS that propagates into solve_relations).

---

Duplicate comments:
In `@ratiopath/parsers/geojson_parser.py`:
- Around line 146-148: The current dropna call on definitions may accidentally
remove the join_key column during future changes; explicitly exclude join_key
when dropping all-NaN columns instead of relying on the earlier guard. Update
the logic around definitions.drop(...).dropna(...) to compute or select the
subset of columns that excludes join_key and apply dropna only to that subset
(keep definitions[join_key] intact), then reassemble or assign back to
definitions; reference the definitions variable and join_key identifier and the
existing drop(columns=["geometry"], errors="ignore") call to locate where to
make this change.

---

Nitpick comments:
In `@tests/test_parsers.py`:
- Around line 201-210: The test test_solve_relations_missing_join_key should
also assert that no leftover data from definition rows remains: after calling
GeoJSONParser.solve_relations(join_key="invalid_key") add an assertion on
parser.gdf that the raw "category" column is either not present or contains only
NaN values (e.g., check "category" not in parser.gdf.columns or
parser.gdf["category"].isna().all()), while keeping the existing checks on row
count, geometry presence, and absence of "category_def".

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7170bd7 and d091b79.

📒 Files selected for processing (2)
  • ratiopath/parsers/geojson_parser.py
  • tests/test_parsers.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
ratiopath/parsers/geojson_parser.py (1)

38-52: ⚠️ Potential issue | 🟠 Major

Preserve CRS when rebuilding the GeoDataFrame.
Reconstructing self.gdf without crs drops spatial reference info, which can break downstream geospatial operations and makes self.gdf.crs None.

🛠️ Proposed fix
-            # Isolate definitions (no geometry) from physical annotations
+            # Isolate definitions (no geometry) from physical annotations
+            original_crs = self.gdf.crs
             has_null_geometry = self.gdf.geometry.isna() | self.gdf.geometry.is_empty
             definitions = self.gdf[has_null_geometry]
             annotations = self.gdf[~has_null_geometry]
@@
             self.gdf = gpd.GeoDataFrame(
                 pd.concat([annotations, definitions], ignore_index=True),
                 geometry="geometry",
+                crs=original_crs,
             )
#!/bin/bash
# Verify geopandas version pinning and locations of dependency metadata
rg -n "geopandas" -g "requirements*.txt" -g "pyproject.toml" -g "setup.cfg" -g "setup.py"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/parsers/geojson_parser.py` around lines 38 - 52, Rebuilding
self.gdf with gpd.GeoDataFrame(pd.concat([...]), geometry="geometry") drops the
CRS; preserve spatial reference by carrying over the original CRS when
reconstructing. Update the block that creates the new GeoDataFrame (around
variables self.gdf, annotations, definitions and the explode call) to pass the
original CRS (self.gdf.crs) to the new GeoDataFrame so the resulting frame
retains the same CRS as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@ratiopath/parsers/geojson_parser.py`:
- Around line 38-52: Rebuilding self.gdf with gpd.GeoDataFrame(pd.concat([...]),
geometry="geometry") drops the CRS; preserve spatial reference by carrying over
the original CRS when reconstructing. Update the block that creates the new
GeoDataFrame (around variables self.gdf, annotations, definitions and the
explode call) to pass the original CRS (self.gdf.crs) to the new GeoDataFrame so
the resulting frame retains the same CRS as before.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d091b79 and d70a1a8.

📒 Files selected for processing (1)
  • ratiopath/parsers/geojson_parser.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ratiopath/parsers/geojson_parser.py (1)

163-170: Consider simplifying the redundant type hint.

The type hint str | Any is redundant since Any already encompasses all types including str. This could be simplified to just x: Any, though the current form does communicate the expected primary input type.

✏️ Suggested simplification
-def safe_to_dict(x: str | Any) -> Any:
+def safe_to_dict(x: Any) -> Any:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/parsers/geojson_parser.py` around lines 163 - 170, The function
signature for safe_to_dict currently uses the redundant type hint `x: str |
Any`; change it to a single `x: Any` parameter (and adjust the return type if
desired) so the hint is not redundant, leaving the body logic unchanged (the
function still checks `isinstance(x, str)` and uses `json.loads` with the
existing `except (json.JSONDecodeError, TypeError)` behavior).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ratiopath/parsers/geojson_parser.py`:
- Around line 163-170: The function signature for safe_to_dict currently uses
the redundant type hint `x: str | Any`; change it to a single `x: Any` parameter
(and adjust the return type if desired) so the hint is not redundant, leaving
the body logic unchanged (the function still checks `isinstance(x, str)` and
uses `json.loads` with the existing `except (json.JSONDecodeError, TypeError)`
behavior).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d70a1a8 and 6c8bd29.

📒 Files selected for processing (2)
  • ratiopath/parsers/geojson_parser.py
  • tests/test_parsers.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_parsers.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_parsers.py`:
- Around line 216-218: Replace the fragile assertion that checks for
"category_def" with merge-specific assertions on parser.gdf: verify there are no
columns ending with the merge suffixes (e.g., "_orig" and "_def") after the
missing-join-key path; specifically, in the test using parser.gdf, assert that
no column name matches the patterns r".*_orig$" or r".*_def$" (and keep the
existing length and geometry checks). This ensures the test looks for absence of
suffix-based merged columns rather than a single column name.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c8bd29 and 8739574.

📒 Files selected for processing (1)
  • tests/test_parsers.py

@Adames4
Copy link
Collaborator

Adames4 commented Feb 28, 2026

The implementation seems unnecessarily complicated to me. Why not store the definitions and annotations as separate dfs in __init__ and then simply join the corresponding relations in get_filtered_geodataframe?

I thought that this way parser would be able to efficiently parse various types of geojson structures, however for the sake of my project alone, it can be simplified.

Also I believe the current implementation, which is using masking, will have better performance then repeated allocation of memory used in merging of dataframes.

I agree with @matejpekar, and I think solve_relations can be made private and called inside __init__. I don’t see any point in calling it separately. Do you?

@vojtech-cifka vojtech-cifka requested a review from Adames4 March 2, 2026 19:48
@matejpekar matejpekar self-requested a review March 3, 2026 09:46
@vojtech-cifka vojtech-cifka requested a review from matejpekar March 3, 2026 19:15
@Adames4 Adames4 merged commit e82572c into RationAI:main Mar 4, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants