Skip to content

Make position generic#275

Draft
weiznich wants to merge 4 commits into
georust:mainfrom
GiGainfosystems:configurable_inline_size
Draft

Make position generic#275
weiznich wants to merge 4 commits into
georust:mainfrom
GiGainfosystems:configurable_inline_size

Conversation

@weiznich
Copy link
Copy Markdown

This commit introduces a const generic argument basically everywhere which allows the user to configure the size of the inline storage used in geojson::Position. This enables users that deal with non-2d data to avoid a lot of small heap allocations in this place, which impacts performance in a non-optimal way.

I did some benchmarks to verify that this actually improves performance:

parse 3d data as 2d (horizons.geojson)
                        time:   [431.93 ms 433.06 ms 434.35 ms]
parse 3d data as 3d (horizons.geojson)
                        time:   [346.94 ms 348.61 ms 350.34 ms]

serialize geojson::FeatureCollection struct 2d (horizons.geojson)
                        time:   [219.19 ms 220.34 ms 221.56 ms]
serialize geojson::FeatureCollection struct 3d (horizons.geojson)
                        time:   [208.05 ms 209.02 ms 210.14 ms]

The corresponding geojson file can be downloaded here: https://ogc-api.hlnug.giga-infosystems.com/collections/2/items?t=geojson (it's 206mb and contains only 3d points). The benchmarks mirror the existing benchmarks with similar name, but with this dataset instead. The 2d variant uses 2 as INLINE_SIZE size, while the 3d variant uses 3 as INLINE_SIZE variant.

Now I'm aware that geojson reached 1.0 a few weeks ago, so that such API changes are hard to implement. Therefore I'm opening this as draft and as a request for discussion.

I see the following solutions to not break the API in most of the cases:

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

@weiznich weiznich force-pushed the configurable_inline_size branch 2 times, most recently from 6a8a0d7 to 404912d Compare April 16, 2026 12:33
Copy link
Copy Markdown
Member

@michaelkirk michaelkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate your concern about the 1.0 release, but that release represents moreso a change in thinking than it does any real milestone in the library. Previously every "point" (Y in the 0.Y.Z) release was a "breaking" release from a semver perspective, and we'd had plenty of those. I anticipate having more breaking releases at some point, so a 2.0, 3.0, etc. are anticipated.

Granted, post 1.0 communicates we're not going to try to break things willy-nilly, but I think we were already at that point for years. Mainly I wanted to be able to take advantage of patch vs. additive changes. (1.Y.0) vs. (1.0.Z) which we couldn't do in the (0.Y.Z) world.

So, a breaking release could be acceptable for significant gain, but we should still try to avoid it if we can, or reasonably minimize the "damage" done. I'm not sure yet. It'll be easier to talk about that once the end game API is arrived at, and I don't think the current proposal is quite there yet since it's too verbose for the common 2-D case.

Can the 3d generics stuff be achieved in a purely additive way? If not, maybe they could be staged in a parallel module for a while without disturbing the existing API's and we could eventually unify them in a subsequent release farther down the road given how recent the 1.0 release was.

Comment thread benches/parse.rs Outdated
geojson::FeatureReader::from_reader(BufReader::new(geojson_str.as_bytes()));
let mut count = 0;
for feature in feature_reader.features() {
for feature in feature_reader.features::<2>() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd wager the vast majority of people using geojson are using 2d. We can't make the common path annoying like this in order to facilitate a minority use case.

A couple of options that spring to mind:

Can const generics have default arguments like other generics? Does that make things better?

Maybe a second set of methods for the non-2d case?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can const generics have default arguments like other generics? Does that make things better?

They can have default values, but only in certain scopes, quite similar to "normal" generics. For this usecase it is unfortunately not helpful as default values are disallowed for functions and impls. They are only valid for the struct definitions itself. So if someone writes serde_json::from_str::<Feature>("your string") it will just work, but the example above will unfortunately not work that way.

Maybe a second set of methods for the non-2d case?

That would be a reasonable solution in my opinion, especially as the 2d methods can forward to the generic ones internally. We likely won't even need to duplicate "all" of the methods, but just those that are producing values.
The only problem there are trait implementations, so things like impl FromStr for Feature and impl From<[f64; N> for Position is problematic and we would need decide what we would like to happen there. Either the generic impl or specific impls for the 2d use-case + a standalone method for the generic case?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only problem there are trait implementations, so things like impl FromStr for Feature and impl From<[f64; N> for Position is problematic and we would need decide what we would like to happen there.

You're right, that does seem a little tricky. I know we have some existing impls, but I'm not sure how often they're used.

As a first pass non-breaking-lowest-effort thing, maybe we could start by having all those continue to return a 2D position?

So essentially, starting with this:

 /// let position_3d = Position::from(vec![1.0, 2.0, 3.0]);
 /// let z = position_3d[2];
 /// ```
+pub type Position = GenericPosition<2>;
+
 #[derive(Debug, Clone, PartialEq, PartialOrd, Serialize, Deserialize)]
-pub struct Position(TinyVec<[f64; 2]>);
+pub struct GenericPosition<const INLINE_STORAGE: usize>(TinyVec<[f64; INLINE_STORAGE]>);

I think that would be a non-breaking change, right?

And then, if it's at all reasonable, I'd prefer to have the higher-dimensionality stuff accessible from another module for now, sharing generic implementations where possible, and feature gated if it would keep codesize down.

I'll add, although having N dimensions is satisfying from a mathematical sense, if we can simplify our implementation by only adding support for 3d (and mayyyyyyybe 4d), I'd probably prefer it.

(Even the geojson spec says so...)

[...]Altitude or elevation MAY be included as an optional third
element.
Implementations SHOULD NOT extend positions beyond three elements
because the semantics of extra elements are unspecified and
ambiguous. Historically, some implementations have used a fourth
element to carry a linear referencing measure (sometimes denoted as
"M") or a numerical timestamp, but in most situations a parser will
not be able to properly interpret these values. The interpretation
and meaning of additional elements is beyond the scope of this
specification, and additional elements MAY be ignored by parsers.

Though maybe it's no more complicated once you've solved the general case. 🤷

Is this something you'd be interested in investing more time in prototyping? I'd be interested in merging it if you can show performance improvements for the 3d case provided you maintain:

  1. 2d doesn't slow down
  2. 2d doesn't become less ergonomic to use
  3. code complexity / LOC doesn't suffer too much

Another thing that occurs to me, if you know you'll always have exactly 3 dimensions, you can squeak out a little more performance by having something like PositionStrictly3D<[f64; 3]> rather than TinyVec, and similarly PositionStrictly2D<[f64; 2]>.

I'm not sure if that'd be useful for you, but I mention it in case there's some generic Position foundation that could be used across these 4 cases:

  1. usually 2 but can be more or less (TinyVec<[f64; 2]>)
  2. usually 3 but can be more or less (TinyVec<[f64; 3]>)
  3. always exactly 2 ([f64; 2])
  4. always exactly 3 ([f64; 3])

I'm not requesting you to solve all these problems, but requesting that you keep it in mind if you continue working on this.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I spend a bit time to work on this. As written before we can keep most of the existing API by just making all existing "constructor" like functions like FromStr::from_str" non-generic. The exception there are generic functions like serde_json::from_str` that required having explicit types there beforehand already. I've done that and that allowed removing all the changes to docs/test/benches. Hopefully that demonstrates that existing code won't need to change much. I did add a few methods for the multidimensional case where required, but that is more the exception. I didn't do that for all methods that might be problematic as there are often other ways to do this with the existing API.
Overall that works reasonable well as you can optionally specify the storage size and rely on the defaults otherwise. This should help with maintainability and also doesn't regress to much in terms of usability.

I did not spend any time on restricting the implementation to certain sizes, although it shouldn't be too hard to introduce more bounds to make sure that INLINE_SIZE is always either 1, 2, or 3. In my mind that makes the code more complex than not restricting it so I skipped that for now.

Another thing that occurs to me, if you know you'll always have exactly 3 dimensions, you can squeak out a little more performance by having something like PositionStrictly3D<[f64; 3]> rather than TinyVec, and similarly PositionStrictly2D<[f64; 2]>.

I did implement that by introducing a second generic argument to Position (and everything that stores a Position) using a similar strategy as for INLINE_SIZE. This allows using a different "storage" like an array. For now that trait is implemented for arrays and the TinyVec type of arbitrary sizes. Restricting the size there would also restrict the dimension thing as you asked above. I turned that for now into a sealed construct so others cannot add their own storage types yet.

Benchmark results:

     Running benches/parse.rs (target/release/deps/parse-5addd866800aae7c)
parse (countries.geojson)
                        time:   [788.74 µs 800.07 µs 814.71 µs]
Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) high mild
  4 (4.00%) high severe

parse 2d array (countries.geojson)
                        time:   [726.44 µs 734.97 µs 744.77 µs]
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

FeatureReader::features (countries.geojson)
                        time:   [3.8133 ms 3.8243 ms 3.8372 ms]
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

FeatureReader::deserialize (countries.geojson)
                        time:   [5.0909 ms 5.1284 ms 5.1749 ms]
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe

parse 3d data as 2d (3d_testdata.geojson)
                        time:   [6.6530 ms 6.7120 ms 6.7789 ms]
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

parse 3d data as 3d (3d_testdata.geojson)
                        time:   [4.9506 ms 4.9800 ms 5.0127 ms]
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

parse 3d data as 3d array (3d_testdata.geojson)
                        time:   [4.3896 ms 4.4204 ms 4.4560 ms]
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe

FeatureReader::deserialize to geo-types (countries.geojson)
                        time:   [5.0958 ms 5.1131 ms 5.1342 ms]
Found 22 outliers among 100 measurements (22.00%)
  11 (11.00%) low mild
  9 (9.00%) high mild
  2 (2.00%) high severe

parse (geometry_collection.geojson)
                        time:   [6.2600 µs 6.2909 µs 6.3496 µs]
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

     Running benches/serialize.rs (target/release/deps/serialize-4f596d903fa830d7)
serialize geojson::FeatureCollection struct (countries.geojson)
                        time:   [324.92 µs 325.49 µs 326.27 µs]
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

Benchmarking serialize geojson::FeatureCollection struct 2d array (countries.geojson): Collecting 100 samples in estimated 6.4249 s (20k iteratserialize geojson::FeatureCollection struct 2d array (countries.geojson)
                        time:   [316.31 µs 317.30 µs 318.85 µs]
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

Benchmarking serialize geojson::FeatureCollection struct 2d (3d_testdata.geojson): Collecting 100 samples in estimated 5.1660 s (2300 iterationserialize geojson::FeatureCollection struct 2d (3d_testdata.geojson)
                        time:   [2.2816 ms 2.2984 ms 2.3181 ms]
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

Benchmarking serialize geojson::FeatureCollection struct 3d (3d_testdata.geojson): Collecting 100 samples in estimated 5.1273 s (2300 iterationserialize geojson::FeatureCollection struct 3d (3d_testdata.geojson)
                        time:   [2.2015 ms 2.2060 ms 2.2110 ms]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

Benchmarking serialize geojson::FeatureCollection struct 3d array storage (3d_testdata.geojson): Collecting 100 samples in estimated 5.0085 s (serialize geojson::FeatureCollection struct 3d array storage (3d_testdata.geojson)
                        time:   [2.1737 ms 2.1824 ms 2.1933 ms]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

serialize custom struct (countries.geojson)
                        time:   [887.88 µs 890.05 µs 892.80 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

Benchmarking serialize custom struct to geo-types (countries.geojson): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.0s, enable flat sampling, or reduce sample count to 70.
serialize custom struct to geo-types (countries.geojson)
                        time:   [982.05 µs 984.07 µs 986.76 µs]
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe

     Running benches/to_geo_types.rs (target/release/deps/to_geo_types-05bfa47287ef0770)
Convert to geo-types    time:   [33.678 µs 33.926 µs 34.226 µs]
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

I did not had the time yet to run an comparison with main. I would expect no changes there as it's literally the same code for the 2d use-case.

@michaelkirk
Copy link
Copy Markdown
Member

Can you also include a 3-d benchmark+data in the PR - one for serialize and deserialize?

The input size should be small enough to reasonably include in the repository. I would be surprised if you needed to process files as large as 200MB to see performance differences from this change.

@weiznich weiznich force-pushed the configurable_inline_size branch 2 times, most recently from ff3ff00 to 6a6f93e Compare April 20, 2026 14:42
@michaelkirk
Copy link
Copy Markdown
Member

michaelkirk commented Apr 20, 2026

This is coming along nicely and those performance wins seem about right.

Since you're now adding a second generic parameter for PositionBuffer, the first one seems redundant - rather than
<const INLINE_SIZE: usize = 2, PB: PositionBuffer = TinyVec<[f64; INLINE_SIZE]> how about just: <PB: PositionBuffer = TinyVec<[f64; 2]>

This would also prevent any nonsense like: Feature<2, [f64; 3]>.

I've applied this change here: https://github.com/georust/geojson/tree/mkirk/configurable_inline_size

What do you think? Feel free to cherry-pick it into your PR

Comment thread src/position.rs
out[counter] = next;
counter += 1;
}
Ok(out)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're missing a check here — if counter < out.len() at this point, we're returning an array with garbage data (initialized to 0).

weiznich and others added 4 commits April 29, 2026 11:22
This commit introduces a const generic argument basically everywhere
which allows the user to configure the size of the inline storage used
in `geojson::Position`. This enables users that deal with non-2d data to
avoid a lot of small heap allocations in this place, which impacts
performance in a non-optimal way.

I did some benchmarks to verify that this actually improves performance:

```
parse 3d data as 2d (horizons.geojson)
                        time:   [431.93 ms 433.06 ms 434.35 ms]
parse 3d data as 3d (horizons.geojson)
                        time:   [346.94 ms 348.61 ms 350.34 ms]

serialize geojson::FeatureCollection struct 2d (horizons.geojson)
                        time:   [219.19 ms 220.34 ms 221.56 ms]
serialize geojson::FeatureCollection struct 3d (horizons.geojson)
                        time:   [208.05 ms 209.02 ms 210.14 ms]

```

The corresponding geojson file can be downloaded here:
https://ogc-api.hlnug.giga-infosystems.com/collections/2/items?t=geojson
(it's 206mb and contains only 3d points). The benchmarks mirror the
existing benchmarks with similar name, but with this dataset instead.
The `2d` variant uses `2` as `INLINE_SIZE` size, while the `3d` variant
uses `3` as `INLINE_SIZE` variant.

Now I'm aware that geojson reached 1.0 a few weeks ago, so that such API
changes are hard to implement. Therefore I'm opening this as draft and
as a request for discussion.

I see the following solutions to not break the API in most of the cases:

* Adding the const generic parameter mostly affects the construction
  side. We could enforce that all existing construction/parsing
functions just return a type with `INLINE_SIZE = 2` and add 
* Issue a 2.0 release
* Declare that this is purely a change in type inference, which is technically allowed (See https://github.com/rust-lang/rfcs/blob/master/text/1105-api-evolution.md#principles-of-the-policy, the first point in the more detail list)
@weiznich weiznich force-pushed the configurable_inline_size branch from 6a6f93e to f08afa0 Compare April 29, 2026 09:24
@weiznich
Copy link
Copy Markdown
Author

Sorry for taking that long to response.

Yes it's reasonable to remove the size generic in favor of just allowing different storage types there. I just went with both as it would have been easier to go back to only the size. For now I pulled in your change, rebased the changes and also fixed the remark about the array deserialization from above.

If you like I also can squash all the commits and add a changelog entry some documentation. I just would like to postpone that until you/the other maintainers are happy with the general design.

@michaelkirk
Copy link
Copy Markdown
Member

michaelkirk commented Apr 29, 2026

Sorry for taking that long to response.

If you like I also can squash all the commits and add a changelog entry some documentation. I just would like to postpone that until you/the other maintainers are happy with the general design.

No worries about the slow turn around from my perspective. This is a big change, and so I'd like to be confident in it.

I can appreciate your apprehension about doing a bunch of work if the feature isn't going to get merged.

I'm still not 100% sure we should do it, but I think it's basically the right approach (like 80% sure). My concern is that we haven't yet covered all the functionality. For example, here's a method still hardcoded to deal with Feature rather than Feature<PB>.

/// Deserialize a GeoJSON FeatureCollection into [`Feature`] structs.                                                                                                                                                              
///                                                                                                                                                                                                                                
/// If instead you'd like to deserialize your own structs from GeoJSON, see [`deserialize_feature_collection`].                                                                                                                    
pub fn deserialize_features_from_feature_collection(                                                                                                                                                                               
    feature_collection_reader: impl Read,                                                                                                                                                                                          
) -> impl Iterator<Item = Result<Feature>> {                                                                                                                                                                                       
    FeatureReader::from_reader(feature_collection_reader).features()                                                                                                                                                               
}    

Are there others? I'm not sure. If we only fix them slowly we'll have to make multiple breaking releases, which is not desirable.

I'd be more comfortable moving forward if we can find a way to iterate on what we have in the real world, while minimizing the impact on people not interested in these features.

We've already made some progress on that in this PR (thanks!).

I made some further changes in my branch to avoid breaking semver: https://github.com/georust/geojson/tree/mkirk/configurable_inline_size

The above branch maintains semver (mostly anyway? [^1]) but makes it slightly more annoying for people who want to leverage the new generic position functionality, by putting breaking things behind a namespace (multdim:: but name is negotiable).

My thinking was that the namespaced functionality would be temporary, and once we're confident that we have all the pieces in place, we can deprecate that namespace and make one breaking release that switches everything over. But hopefully just once, rather than several breaking releases. In the meanwhile, we can continue to make minor non-breaking releases.

What do you think of that approach? Do you think the namespacing is too arduous? Do you have an application(s) that can apply and test these new APIs in the real world?


Footnote 1: With those changes, cargo semver-checks is still failing for the type alias, but I'm not sure why it's considered a breaking change:

#[deprecated(note = "Renamed to GeometryValue")]                                                                                                                                                                                   
pub type Value = GeometryValue;   

Note, due to the default generics, that's equivalent to

#[deprecated(note = "Renamed to GeometryValue")]                                                                                                                                                                                   
pub type Value = GeometryValue<TinyVec<[f64; 2]>;   

It seems like anyone using Value in their legacy code could continue to use it. Is this a false positive from cargo semver-checks or am I missing something?

@michaelkirk
Copy link
Copy Markdown
Member

As for other maintainers, maybe @urschrei or @frewsxcv want to weigh in?

@michaelkirk
Copy link
Copy Markdown
Member

michaelkirk commented Apr 30, 2026

I also think the generics are a bit gnarly. Feature<TinyVec<[f64; 3]>> is quite a mouthful. This was admittedly better with the two-level generics you had before:

Feature<3> | Feature<3, TinyVec<[f64; 3]>>

But recall that let you do nonsense things like: Feature<666, TinyVec<[f64; 3]>>

Maybe there's a way we can keep the current sound generics and make some custom types or type aliases to make it read nicer | more intuitive:

mod dims {
  /// Spills over to heap after N dimensions
  type Flex<N> = TinyVec<[f64; N]>;
  /// Position must be exactly N dimensions
  type Fixed<N> = [f64; N];
}

# Like Feature<TinyVec<[f64; 3]>>
let feature: Feature<dims::Flex<3>> = todo!();

# Like Feature<[f64; 4]>
let feature: Feature<dims::Fixed<4>> = todo!();

Or go real wild like:

mod dims {
  /// Spills over to heap after N dimensions
  type Flex<N> = TinyVec<[f64; N]>;
  /// Position must be exactly N dimensions
  type Fixed<N> = [f64; N]
  type Flex2 = Flex<2>;
  type Fixed2 = Fixed<2>;
  type Flex3 = Flex<3>;
  type Fixed3 = Fixed<3>;
  type Flex4 = Flex<4>;
  type Fixed4 = Fixed<4>;
}

# Like Feature<TinyVec<[f64; 3]>>
let feature: Feature<dims::Flex3> = todo!();

# Like Feature<[f64; 4]>
let feature: Feature<dims::Fixed4> = todo!();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants