fix(hami-scheduler): compatible with app's stop fast behavior by dkeven · Pull Request #2712 · beclab/Olares

dkeven · 2026-03-18T13:47:45Z

Background
After appservice: stop app fast if pod was hami schudule failed when resume #2699, an Application that's reported unschedulable by hami-scheduler will be stopped immediately by app-service, however, hami-scheduler also reports unschedulable and make kube-sheduler retry scheduling in many retryable cases, such as node locked by another pod. Also, the asynchronous nature of HAMi's informer may lead to device occupation stats not updated immediately, causing pod to be scheduled only in the next retry. Two changes have been made to make HAMi compatible with this new logic:
1.Add a new event type reasoned as InsufficientGPU that's dedicated to the case when no available GPU resources can be found for the to-be scheduled pod, separating from other normal retryable cases.
2.When pod is deleted by HAMi-scheduler itself, update the in-memory device usage immediately rather than relying on the pod informer to update the state, to avoid potential race conditions with the deployment controller.
Target Version for Merge
1.12.5, 1.12.6
Related Issues
none
PRs Involving Sub-Systems
fix(scheduler): compatible with app's stop fast behavior HAMi#17
Other information:
none

vercel · 2026-03-18T13:47:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
olares-docs	Ignored		Mar 18, 2026 1:47pm

…avior (#2712)

fix(hami-scheduler): compatible with app's stop fast behavior

8bfbdcc

eball approved these changes Mar 18, 2026

View reviewed changes

eball merged commit 5f84fcb into main Mar 18, 2026
12 checks passed

eball added a commit that referenced this pull request Mar 19, 2026

cherry-pick: fix(hami-scheduler): compatible with app's stop fast beh…

91376c8

…avior (#2712)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(hami-scheduler): compatible with app's stop fast behavior#2712

fix(hami-scheduler): compatible with app's stop fast behavior#2712
eball merged 1 commit intomainfrom
gpu/fix/app_stop_compat

dkeven commented Mar 18, 2026

Uh oh!

vercel bot commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dkeven commented Mar 18, 2026

Uh oh!

vercel bot commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants