Use loop form in orderedMin4 to enable inlining by aalpar · Pull Request #6 · aalpar/deheap

aalpar · 2026-03-08T21:52:35Z

Summary

Rewrites orderedMin4 from unrolled-with-early-exits to a loop, reducing the Go compiler's inlining cost from 130 to 63 (budget: 80)
This allows orderedMin4 to be inlined into orderedBubbledown's hot loop, eliminating function call overhead on every iteration
No algorithm or semantic changes — same min-max heap behavior

Benchmarks

Isolated change, Apple M4 Max, go test -bench BenchmarkOrdered -benchmem -count=5 -benchtime=2s:

                  │   master     │     orderedMin4 loop                │
                  │    sec/op    │    sec/op     vs base               │
OrderedPop-16       286.0n         236.5n       -17.31% (p=0.008)
OrderedPopMax-16    283.9n         234.9n       -17.26% (p=0.008)
OrderedPushPop-16   290.4n         246.5n       -15.12% (p=0.008)
OrderedPush-16      11.83n         11.76n            ~ (p=0.381)

Two other candidate optimizations (isMinHeap simplification, bounds-check-elimination hint in orderedBubbledown) were benchmarked in isolation and showed slight regressions, so they were dropped.

Test plan

go test -race -count=1 ./... passes
Existing unit, randomized (1000 iterations), and fuzz tests all exercise orderedMin4 through Pop/PopMax/Remove/Fix paths

🤖 Generated with Claude Code

The unrolled-with-early-exits pattern costs 130 in the compiler's inlining model (budget: 80), preventing orderedMin4 from being inlined into orderedBubbledown's hot loop. A loop form costs 63, bringing it under budget. Benchmarked in isolation on Apple M4 Max (n≈1000, 5 runs): OrderedPop 286.0ns → 236.5ns -17.31% (p=0.008) OrderedPopMax 283.9ns → 234.9ns -17.26% (p=0.008) OrderedPushPop 290.4ns → 246.5ns -15.12% (p=0.008) OrderedPush 11.83ns → 11.76ns ~ (p=0.381)

Copilot

Pull request overview

This PR optimizes the generic min/max heap’s hot path by rewriting orderedMin4 into a loop form that stays within the Go compiler’s inlining budget, enabling inlining into orderedBubbledown and reducing per-iteration overhead.

Changes:

Replaces the unrolled/early-exit implementation of orderedMin4 with a bounded loop over up to 4 consecutive elements.
Adds documentation explaining the inlining-cost motivation for the loop form.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

aalpar requested a review from Copilot March 8, 2026 21:53

Copilot started reviewing on behalf of aalpar March 8, 2026 21:53 View session

aalpar added the performance Changes to improve performance: memory or CPU. label Mar 8, 2026

Copilot AI reviewed Mar 8, 2026

View reviewed changes

aalpar merged commit 433f4ac into master Mar 8, 2026
8 checks passed

aalpar deleted the optimize-hot-path branch March 8, 2026 22:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use loop form in orderedMin4 to enable inlining#6

Use loop form in orderedMin4 to enable inlining#6
aalpar merged 1 commit into
masterfrom
optimize-hot-path

aalpar commented Mar 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aalpar commented Mar 8, 2026

Summary

Benchmarks

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants