Skip to content

feat: reduce memory usage#518

Draft
RaczeQ wants to merge 131 commits into
mainfrom
raczeq/memory-offloading
Draft

feat: reduce memory usage#518
RaczeQ wants to merge 131 commits into
mainfrom
raczeq/memory-offloading

Conversation

@RaczeQ
Copy link
Copy Markdown
Collaborator

@RaczeQ RaczeQ commented Mar 1, 2025

  • Change GeoDataFrame and DataFrame arguments to special objects ParquetDataTable and GeoDataTable
  • Add dataloaders based on streaming from Parquet files
  • Rewrite logic in all components (for now just cast to GeoDataFrame / DataFrame)
    • Embedders
      • Count - sinks via polars, maybe rewrite to DuckDB
      • ContextualCount - rewrite to DuckDB?
      • GeoVeX
      • Hex2Vec
      • Gtfs2Vec
      • Highway2Vec
      • S2Vec
    • Joiners
    • Loaders
      • OSM Loaders
        • PBF
        • Online
        • Tiles
        • Highway
      • GTFS Loader
      • Overture Maps Loader
    • Regionalizers
      • Voronoi
      • H3 (rewrite to DuckDB H3, specify if return in int or str index)
      • S2 (rewrite to DuckDB geography)
      • Geocode (maybe?)
      • SlippyMap
      • AdministrativeBoundary (maybe rewrite to OvertureMaps boundaries dataset!)
    • Wrap new objects in plotting module
  • Add new IntersectionJoiner based on duckdb-geography extension (with s2 cell covering) or h3 extension (with h3 cell covering).
  • Add option to not return geometry in H3Regionalizer or S2Regionalizer
  • Change H3 index from str to int to reduce memory usage
  • Add option to contextualize any embedding after the fact (export ContextualCountEmbedder logic and create a new function / class)
  • Add option to embed any pre-computed count embedding with other classes (additional abstract subclass of CountEmbedder with default transform and transform_count_embeddings that can reuse computed embeddings)

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 1, 2025

Codecov Report

❌ Patch coverage is 80.32995% with 155 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.12%. Comparing base (7efde62) to head (ece3296).

Files with missing lines Patch % Lines
srai/geodatatable.py 86.47% 38 Missing ⚠️
srai/embedders/contextual_count_embedder.py 62.65% 31 Missing ⚠️
srai/loaders/osm_loaders/osm_online_loader.py 34.78% 30 Missing ⚠️
srai/neighbourhoods/h3_neighbourhood.py 51.35% 18 Missing ⚠️
srai/neighbourhoods/_base.py 26.08% 17 Missing ⚠️
srai/loaders/download.py 20.00% 8 Missing ⚠️
srai/h3.py 86.36% 3 Missing ⚠️
srai/embedders/_base.py 75.00% 2 Missing ⚠️
srai/loaders/_base.py 84.61% 2 Missing ⚠️
srai/duckdb.py 95.00% 1 Missing ⚠️
... and 5 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #518      +/-   ##
==========================================
- Coverage   91.59%   89.12%   -2.47%     
==========================================
  Files          63       64       +1     
  Lines        2616     3044     +428     
==========================================
+ Hits         2396     2713     +317     
- Misses        220      331     +111     
Flag Coverage Δ
ubuntu-latest-python3.10 89.12% <80.32%> (?)
ubuntu-latest-python3.11 89.12% <80.32%> (?)
ubuntu-latest-python3.12 89.06% <80.32%> (-2.53%) ⬇️
ubuntu-latest-python3.9 89.11% <80.32%> (?)
windows-latest-python3.12 89.12% <80.32%> (-2.47%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@RaczeQ RaczeQ linked an issue Mar 2, 2025 that may be closed by this pull request
@RaczeQ RaczeQ linked an issue Apr 5, 2025 that may be closed by this pull request
RaczeQ and others added 26 commits September 2, 2025 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for h3 int cells in H3Neighbourhood H3Regionalizer stuck for Alaska, United States Feat: add big data workflow capabilities

1 participant