feat(vortex-geo): native Point extension type and GeoDistance scalar function#8342
Open
HarukiMoriarty wants to merge 5 commits into
Open
feat(vortex-geo): native Point extension type and GeoDistance scalar function#8342HarukiMoriarty wants to merge 5 commits into
HarukiMoriarty wants to merge 5 commits into
Conversation
Adds a GeoArrow-style `Point` extension type (Struct<x,y,[z],[m]>, dimension-ready) and the planar `GeoDistance` scalar function between two point columns. Signed-off-by: Nemo Yu <zyu379@wisc.edu>
… point GeoDistance computes the planar distance from each point in a column to a single constant query point (e.g. `ST_Distance(column, point)`). The second operand must be a constant: it is decoded once and broadcast over the column rather than materialized to one identical row per output element. Column-to- column distance is unsupported and errors. `try_new_array` now infers the output length from the point column instead of taking it as an explicit parameter. Signed-off-by: Nemo Yu <zhenghong@spiraldb.com>
026ecbd to
ec95875
Compare
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
20.2 µs | 35.2 µs | -42.51% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
177.2 µs | 213.3 µs | -16.93% |
| ❌ | Simulation | varbinview_large |
113.1 µs | 131.7 µs | -14.13% |
| ❌ | Simulation | decompress_rd[f64, (100000, 0.0)] |
845.2 µs | 980 µs | -13.76% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
274.4 µs | 309.3 µs | -11.27% |
| ⚡ | Simulation | decompress_rd[f64, (10000, 0.0)] |
138 µs | 110.9 µs | +24.41% |
| ⚡ | Simulation | decompress_rd[f64, (10000, 0.1)] |
137.8 µs | 110.9 µs | +24.24% |
| ⚡ | Simulation | decompress_rd[f64, (10000, 0.01)] |
137.4 µs | 110.6 µs | +24.23% |
| ⚡ | Simulation | decompress_rd[f32, (10000, 0.1)] |
89.3 µs | 80.2 µs | +11.4% |
| ⚡ | Simulation | decompress_rd[f32, (10000, 0.0)] |
89.6 µs | 80.8 µs | +10.93% |
| ⚡ | Simulation | decompress_rd[f32, (10000, 0.01)] |
89.3 µs | 80.7 µs | +10.66% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing HarukiMoriarty:nemo/geo-point (ec95875) with develop (3d7bbfb)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a native point type to
vortex-geo. Points are by far the most common geometry in analytical datasets, and a columnar representation makes their coordinates directly accessible without parsing WKB.It also adds the scalar function: point-to-point distance with PostGIS
ST_Distancesemantics (planar/Euclidean, results in CRS units).API Changes
Adds to
vortex-geo, all registered throughvortex_geo::initialize:Point(vortex.geo.point): a location stored asStruct<x, y, z?, m?>of non-nullablef64, wherez?is an optional elevation andm?an optional measure.Coordinate: the internal value a point scalar unpacks to.GeoDistance(vortex.geo.distance): per-row distance between two equal-length point columns; either or both operands may be constant, in which case the query point is decoded once and broadcast.Testing
Unit tests cover dtype validation for every GeoArrow dimension (and rejection of invalid storage), round-tripping a point column through scalar execution back to the original coordinates, WKT display for all four dimensions, and distance over all operand shapes: column-to-constant (either side), column-to-column, and constant-to-constant.