Skip to content

feat: enforce per-node move limit rather than per-node requeue limit#51

Merged
adsharma merged 1 commit into
Ladybug-Memory:mainfrom
taisnguyen:taisnguyen/per-node-move-limit
Jul 2, 2026
Merged

feat: enforce per-node move limit rather than per-node requeue limit#51
adsharma merged 1 commit into
Ladybug-Memory:mainfrom
taisnguyen:taisnguyen/per-node-move-limit

Conversation

@taisnguyen

Copy link
Copy Markdown
Contributor

An extension to #47. In that PR, a per-node requeue limit was implemented. Note the difference between a per-node move limit: a node can be requeued many times without successfully moving. Further benchmarking show that a per-node requeue limit has high variance in its runtime, which may be due to the randomness of node scheduling. Since the variance does seem concerning (i.e., that there exists a scheduling path that may avoid a lot of useful work), I propose we switch to a per-node move limit (originally suggested by @adsharma). The below graph shows that variance is less with this change, achieved CPM is higher, but runtime is higher. Although runtime is higher, it is still considerably bounded (see the "no limit" data point):

image

In some cases with per-node requeue limit, we saw that runtime can differ by a factor of 2x-3x. A preliminary look at the quality of each produced cluster suggest that cluster quality is degraded in the low-runtime case (e.g., min-cut is lower).

Summary: A per-node move limit is more reliable than a per-node requeue limit, since it prevents cases where a lot of work is skipped (through the randomness of node scheduling). In cases where work is skipped, we believe cluster quality is worse.

@adsharma

adsharma commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Agreed!

@adsharma adsharma merged commit b72c546 into Ladybug-Memory:main Jul 2, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants