Skip to content

Performance and memory issues with "Precise Forestry" preset (large sliding window) #339

@bloom256

Description

@bloom256

Description

While investigating a performance regression in the "Precise Forestry" preset, I identified two related issues that appear when using an extremely large sliding window (10 km).


1) Performance regression due to bucket linking

A recent optimization introduced bucket linking: indoor buckets store a direct pointer to the corresponding outdoor bucket, avoiding repeated outdoor hash lookups.
This requires a linking stage that iterates linearly over the entire indoor bucket map.

This works well for most presets because:

  • the sliding window is small,
  • bucket maps stay relatively small,
  • the linear linking cost is negligible compared to the saved lookup cost.

In "Precise Forestry":

  • the sliding window is large (10 km),
  • bucket maps grow very large,
  • the linear linking pass becomes too expensive.

In this case, the linking cost grows linearly with bucket map size, while the benefit of saved lookups is roughly constant per point. Empirically, once the bucket map grows beyond ~0.5M elements, the linking cost outweighs the benefit and performance degrades further as size increases.

Quick fix:

Remove the bucket linking entirely, accepting a ~5% slowdown for other presets with small sliding windows.

This eliminates the regression in "Precise Forestry" and removes a complex and fragile optimization path.


2) Excessive memory usage

When running the "Precise Forestry" preset on a ~6.5 km dataset on a machine with 32 GB RAM:

  • all RAM was consumed,
  • swap usage grew to ~80 GB,
  • the system became unresponsive and required a hard reboot.

This suggests that the preset allows effectively unbounded in-memory growth of bucket data, which is unsafe for typical developer machines.

Proposal:

The "Precise Forestry" preset likely needs a more fundamental redesign to avoid unbounded memory growth. Possible directions:

  • Use a persistent or key-value storage (e.g. RocksDB or similar) for bucket data, so that not all buckets need to be kept in RAM at once.
  • Any other approach that reliably eliminates excessive swap usage and prevents the OS from becoming unresponsive on large datasets.

The goal is to ensure that running this preset cannot exhaust system memory or freeze the machine, even on large datasets.


Notes

I initially overlooked "Precise Forestry" preset when designing the bucket-linking optimization.
The issue appears specific to very large sliding windows and does not affect other configurations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions