Skip to content

Use loop form in orderedMin4 to enable inlining#6

Merged
aalpar merged 1 commit into
masterfrom
optimize-hot-path
Mar 8, 2026
Merged

Use loop form in orderedMin4 to enable inlining#6
aalpar merged 1 commit into
masterfrom
optimize-hot-path

Conversation

@aalpar
Copy link
Copy Markdown
Owner

@aalpar aalpar commented Mar 8, 2026

Summary

  • Rewrites orderedMin4 from unrolled-with-early-exits to a loop, reducing the Go compiler's inlining cost from 130 to 63 (budget: 80)
  • This allows orderedMin4 to be inlined into orderedBubbledown's hot loop, eliminating function call overhead on every iteration
  • No algorithm or semantic changes — same min-max heap behavior

Benchmarks

Isolated change, Apple M4 Max, go test -bench BenchmarkOrdered -benchmem -count=5 -benchtime=2s:

                  │   master     │     orderedMin4 loop                │
                  │    sec/op    │    sec/op     vs base               │
OrderedPop-16       286.0n         236.5n       -17.31% (p=0.008)
OrderedPopMax-16    283.9n         234.9n       -17.26% (p=0.008)
OrderedPushPop-16   290.4n         246.5n       -15.12% (p=0.008)
OrderedPush-16      11.83n         11.76n            ~ (p=0.381)

Two other candidate optimizations (isMinHeap simplification, bounds-check-elimination hint in orderedBubbledown) were benchmarked in isolation and showed slight regressions, so they were dropped.

Test plan

  • go test -race -count=1 ./... passes
  • Existing unit, randomized (1000 iterations), and fuzz tests all exercise orderedMin4 through Pop/PopMax/Remove/Fix paths

🤖 Generated with Claude Code

The unrolled-with-early-exits pattern costs 130 in the compiler's
inlining model (budget: 80), preventing orderedMin4 from being
inlined into orderedBubbledown's hot loop. A loop form costs 63,
bringing it under budget.

Benchmarked in isolation on Apple M4 Max (n≈1000, 5 runs):

  OrderedPop    286.0ns → 236.5ns  -17.31% (p=0.008)
  OrderedPopMax 283.9ns → 234.9ns  -17.26% (p=0.008)
  OrderedPushPop 290.4ns → 246.5ns -15.12% (p=0.008)
  OrderedPush   11.83ns → 11.76ns       ~ (p=0.381)
@aalpar aalpar requested a review from Copilot March 8, 2026 21:53
@aalpar aalpar added the performance Changes to improve performance: memory or CPU. label Mar 8, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the generic min/max heap’s hot path by rewriting orderedMin4 into a loop form that stays within the Go compiler’s inlining budget, enabling inlining into orderedBubbledown and reducing per-iteration overhead.

Changes:

  • Replaces the unrolled/early-exit implementation of orderedMin4 with a bounded loop over up to 4 consecutive elements.
  • Adds documentation explaining the inlining-cost motivation for the loop form.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@aalpar aalpar merged commit 433f4ac into master Mar 8, 2026
8 checks passed
@aalpar aalpar deleted the optimize-hot-path branch March 8, 2026 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Changes to improve performance: memory or CPU.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants