feat(planmemory): first-fit-decreasing buffer ordering (opt-in) by tonibohnlein · Pull Request #1 · tonibohnlein/PTOAS

tonibohnlein · 2026-06-30T09:55:09Z

Summary

Adds an opt-in order-by-size option to PlanMemory's local allocator that
processes buffers largest-first (first-fit-decreasing) instead of the current
DMA-first / generation order. Default behavior is unchanged.

CLI: --plan-memory-order-by-size
Pass option: order-by-size (default false)

Why — theoretical argument

On-chip buffer allocation is Dynamic Storage Allocation (DSA): pack buffers,
each with a fixed live interval and a size, into a fixed-capacity strip while
minimizing the peak (lower bound = LOAD, the max simultaneously-live size).
With uniform sizes this is interval-graph colouring (greedy is optimal); with
the heterogeneous sizes real kernels produce (mixed dtypes, reductions,
asymmetric tiles) DSA is NP-hard, and the quality of a first-fit allocator
depends strongly on the order buffers are placed:

Arbitrary / generation order has no constant-factor guarantee.
Decreasing-size order is the basis of the classic first-fit-decreasing
bound (bin packing: FFD ≤ 11/9·OPT vs first-fit's 17/10·OPT) and is the
ordering used by XLA (best-fit-decreasing heap simulation), TVM USMP
(greedy-by-size), and MindSpore SOMAS. PlanMemory was the outlier — it
ordered by DMA-touch (VEC only) and otherwise by generation order.

This change brings PlanMemory's ordering in line with the established baselines,
at zero risk to default behaviour.

What changed

GetSizeOrderedRootStorageEntry: when enabled, stable-sort buffers by
decreasing size across all memory spaces (the existing DMA-first reorder is
VEC-only), keeping ping-pong (double-buffer) pairs contiguous.
Wired as a default-off pass option (Passes.td) + a ptoas CLI flag.
Ordering only affects the reuse path (entered when Σ buffer sizes ≥ capacity); a clique that fits still takes the sequential fast path, where
there is no peak to save — so the option is a no-op exactly where it cannot
help.

Results

Measured over the TileLang ST suite + JIT kernels (213 files):

metric	default	order-by-size
space-peak regressions	—	0
space-peak improvements	—	1 (−32 KB, −16.7% on a heterogeneous `tsort` kernel)

No degradation. Uniform-size instances are byte-identical (stable sort),
so all 16 existing plan_memory_* lit tests pass unchanged.
New lit test plan_memory_order_by_size_reuse.pto exercises the reuse path:
the largest tile is placed at offset 0 only with the flag.

The on-corpus win is small because the available test corpus is dominated by
fitting cliques and tiny per-op kernels; the benefit appears only under forced
reuse + heterogeneity, which is where larger real kernels live.

Scope / follow-ups

Default stays off. Flipping it should first measure downstream sync
impact — tighter packing increases buffer reuse, which can add synchronization
(the memory↔sync coupling).
Complementary next steps from the same baselines: best-fit placement, in-place
aliasing/donation, and InEx (half-open) lifetime semantics.

The local-memory allocator processes buffers in a DMA-first order (VEC only) and otherwise in generation order. For the heterogeneous buffer sizes real kernels produce, first-fit-decreasing (largest-first) order packs tighter -- it is the ordering XLA, TVM and SOMAS all use. Add a default-off `order-by-size` pass option (CLI flag --plan-memory-order-by-size). When enabled, the reuse path sorts buffers largest-first across every memory space, keeping ping-pong pairs contiguous. Default behavior is unchanged and uniform-size instances are untouched (stable sort), so existing tests are unaffected. Measured over the TileLang ST suite + JIT kernels (213 files): one space-peak reduction (-32KB), zero regressions. New lit test exercises the reuse path: the largest tile is placed at offset 0 only with the flag.

tonibohnlein · 2026-06-30T10:10:04Z

Superseded by the upstream PR hw-native-sys#885 (same branch, rebased onto latest main).

tonibohnlein force-pushed the planmem-order-by-size branch from 0ec4b6a to f8d878c Compare June 30, 2026 10:09

tonibohnlein closed this Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(planmemory): first-fit-decreasing buffer ordering (opt-in)#1

feat(planmemory): first-fit-decreasing buffer ordering (opt-in)#1
tonibohnlein wants to merge 1 commit into
mainfrom
planmem-order-by-size

tonibohnlein commented Jun 30, 2026

Uh oh!

tonibohnlein commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tonibohnlein commented Jun 30, 2026

Summary

Why — theoretical argument

What changed

Results

Scope / follow-ups

Uh oh!

tonibohnlein commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant