Skip to content

Feat: add big data workflow capabilities #396

@RaczeQ

Description

@RaczeQ

Currently, the library focuses completely on the GeoPandas GeoDataFrames and requires the whole dataset from start to finish to fit on the machine. This isn't ideal, since working with bigger areas requires higher RAM usage. In this issue, we should decide which framework/library to use in the final pipeline.

Any insight from people who used those tools with any tips will be very helpful 😄

Currently available options:

  • dask-geopandas - GeoPandas extension for Dask
  • Apache Sedona - dedicated wrapper over Apache Spark and Flink for spatial operations
  • duckdb-spatial - fast in-memory db with spatial extension
  • geoarrow-python - currently developed standard for Apache Arrow for storing spatial objects
  • GeoPolars - geospatial extension for Polars, written in Rust

We should also decide if the library will depend on a single framework only, or if it will be open for extensions and implement multiple backends - similar to the ibis project. Since we write our code with abstract API, we should be able to implement multiple backends, but we will have to make sure that all results are consistent (high-quality tests) and with different backends, outputs will be either different (dask-dataframe, duckdb relation, sedona object, geodataframe, geoparquet/geofeather file path) or we will have to write an abstraction around each object to make it consistent and backends-agnostic.

We could also finish each operation with a calculated geo-parquet/arrow/feather file and work on files instead of loading them into memory.

Additional tools worth mentioning:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions