Conversation
The SMP build-and-test job hung on selftest 32 (priority_order) past the 120s watcher budget and was killed. - The make watcher's CHECK_*_TIMEOUT used := so the CI workflow's CHECK_SELFTEST_TIMEOUT=240 env override never applied; the effective budget stayed at the 120s default. Switch to ?= so the override propagates. - test_priority_order waited for the three pinned worker tasks via a single sleep_ms(50). That wake path arms one callout and depends on a same-priority same-hart re-enqueue to drag the idle thread out of wfi: which is fragile under QEMU TCG SMP timing and has been observed to stall the test indefinitely. Replace the blind sleep with an atomic done-counter polled via sleep_ms(0) yields and a 2s wall-clock deadline, so the test makes progress on its own and fails cleanly if the workers never run. - dl_replenish_cb only re-enqueued the task when it found it in TD_STATE_DL_THROTTLED, missing the case where the task is already sitting in pcpu_dl_runq[cpu] in TD_STATE_READY (sched_dl_pick_next was skipping it because dl_throttled was set). Without that poke the owning hart can stay parked in wfi after the throttle flag clears. Set need_resched on the task's CPU (and cross-hart IPI) when we observe READY at replenish time.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The SMP build-and-test job hung on selftest 32 (priority_order) past the 120s watcher budget and was killed.
Summary by cubic
Stabilizes the SMP priority_order selftest and fixes a deadline scheduler wakeup gap to prevent CI hangs and ensure replenished tasks run promptly. Also makes CI selftest timeout overrides work.
CHECK_TIMEOUTandCHECK_SELFTEST_TIMEOUTto?=so CI env overrides apply.sleep_ms(50)with an atomic done counter polled viasleep_ms(0)yields and a 2s deadline to avoid SMP stalls and fail cleanly if workers never run.dl_replenish_cb, when a task isREADY, setneed_reschedon its CPU (and send IPI if needed) so it’s picked after replenishment; previously only handledDL_THROTTLED.Written for commit b87ac2b. Summary will update on new commits.