fix(scheduler): realign per-row samplers and logits processors by uid by efortin · Pull Request #1824 · jundot/omlx

efortin · 2026-06-11T15:34:51Z

Summary

This PR fixes a row-alignment bug in continuous batching. samplers and logits_processors are positional lists that must stay aligned with uids; under mixed concurrent load, stale or shifted slots could make a request run another request's constraints, or none at all.

In practice, this could silently disable grammar constraints or thinking budgets.

Changes

Record the expected sampler and logits processors for each (model, uid) when a row is inserted.
Realign batch rows at the generation-step chokepoint before decoding.
Key by model as well as uid to avoid cross-engine uid collisions.
Release registry rows on completion, abort, recovery, reset, and shutdown.
Keep the existing fix: normalise logits_processors row slots dropped to None by batch merge #1799 None-slot normalization as fallback behavior.
Log when a drift is corrected, with rate limiting.

Tests

Covers the #1823 stale-slot reproduction, cross-model uid collisions, cleanup paths, registry bounds, warning rate limiting, and the existing #1799 normalization case.

Validation: reproduction is red on the base commit and green on this PR; targeted scheduler tests pass.

jundot#1799 made the generation step crash-safe by normalising None row slots, but the positional drift behind it was still there: a stale or offset slot left in samplers/logits_processors by batch extend/filter/split shifts every row after it, so a request can decode with another request's - or no - sampler and processors. Under concurrent mixed load this silently disables grammar constraints (json_schema responses come back as prose) and thinking budgets (unbounded reasoning), with no error anywhere. Record at insert time what each uid must run, and realign the positional lists from that registry at the step chokepoint - the same place as the jundot#1799 normalisation. A rate-limited warning fires whenever a drift is actually corrected, so the silent corruption becomes observable. The registry is a bounded OrderedDict so a missed cleanup can never grow it unbounded. Repro (concurrent plain + constrained request, 0.3s apart): 10/10 thinking_budget violations and 5/5 grammar violations on main; 0/15 with this fix. Solo and sequential behavior unchanged, byte-identical at temperature 0.

…pletion mlx-lm's BatchGenerator numbers uids per instance starting at zero, so two engines serving concurrently (or an engine reload) produce colliding uid sequences; a registry keyed by uid alone could install one model's sampler and processors on another model's row. Key entries by (id(model), uid) instead - the model object is the one identity both the insert sites and the step chokepoint can see. Also guard the registry with a lock (engines run on separate executor threads and share the module-level dict) and release a request's row at the existing uid-map cleanup site, so heavy grammar processors are not pinned until LRU eviction; the bounded size stays as the backstop. New tests: same uid on two models must not cross-contaminate, and completion unregister drops the row (double-unregister is a no-op).

…istry signature

Each (model, uid) key is inserted exactly once per request, so the OrderedDict already appends it last; plain insertion order is all the oldest-first backstop eviction needs. One dict operation per row instead of two, and the lps copy reads as list(lps or ()).

…shutdown The row registry was only released in the deferred async-remove path, so a request finishing through the synchronous fallback, an aborted request, or a generator reset (failure recovery, reset(), shutdown()) kept its sampler and logits processors pinned until the FIFO backstop — and entries surviving an engine unload were exactly the residue a recycled id(model) could later match. Release the row in every path that retires a uid, and release by model wherever the uid maps are cleared wholesale. Rate-limit the drift warning to one WARNING per 60s window (DEBUG otherwise) so a pathological merge pattern cannot flood the logs. Tests: structural AST checks that every retirement path releases its rows, a model-scoped clear unit test, the pre-fix offset behavior pinned through the empty-registry fallback, and the warning rate limit.

Pull the chokepoint realignment into a pure _realigned_rows function, name the drift comparison rules (_row_drifted: identity fast path stays inline at the caller, two fresh empty lists are equivalent, content comparison last), move the rate-limited log into _log_drift_correction, and type the registry entries as a NamedTuple. No behavior change; the rebuilt lists are still installed unconditionally and drift only drives logging. Two direct unit tests on the pure rebuild.

jundot · 2026-06-14T07:50:54Z

Thanks for chasing this down and for rebasing it on top of #1845. The uid registry approach is the right fix for the remaining BatchGenerator row drift: it covers samplers as well as logits processors, and it corrects the positional lists at the decode chokepoint instead of only patching one reshape path.

One follow-up I'll fold in after merge: native MTP can bypass GenerationBatch._step from its patched next(), so it should call the same row realignment before MTP eligibility and row extraction. That does not block this standard decode fix, and I want to get this row-drift fix in first.

This looks good to me, and I'm going to merge it.

efortin · 2026-06-14T09:35:34Z

This bug was critical for me (lot of batch with structured output on my side).
Thanks a lot, I can move away from my fork :-)

efortin force-pushed the fix/batch-row-realignment branch from 79d4534 to 7d2048a Compare June 11, 2026 15:47

efortin mentioned this pull request Jun 11, 2026

fix(api): harden /v1/completions thinking_budget (validation, stream parity, tokenizer detection) on top of #1844 #1821

Merged

efortin marked this pull request as ready for review June 11, 2026 16:03

efortin mentioned this pull request Jun 11, 2026

Grammar and thinking_budget silently stop applying under concurrent mixed load (row misalignment after batch merge — sequel to #1798) #1823

Closed

efortin added 7 commits June 13, 2026 12:20

test(scheduler): pin registry fallback and short-slot padding

e6f53e4

test(scheduler): update the insert-site guard for the model-keyed reg…

bba0c12

…istry signature

efortin force-pushed the fix/batch-row-realignment branch from b77ea27 to 5f59d6a Compare June 13, 2026 10:24

efortin mentioned this pull request Jun 13, 2026

thinking_budget: reasoning spills past the close at low budgets (intermittent, correlated with non-English / Cyrillic prompts) #1864

Open

fix(scheduler): log sampler-only row realignment

7a35d4f

jundot merged commit 205a23b into jundot:main Jun 14, 2026
4 checks passed

efortin deleted the fix/batch-row-realignment branch June 14, 2026 09:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(scheduler): realign per-row samplers and logits processors by uid#1824

fix(scheduler): realign per-row samplers and logits processors by uid#1824
jundot merged 8 commits into
jundot:mainfrom
efortin:fix/batch-row-realignment

efortin commented Jun 11, 2026 •

edited

Loading

Uh oh!

jundot commented Jun 14, 2026

Uh oh!

Uh oh!

efortin commented Jun 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

efortin commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Tests

Uh oh!

jundot commented Jun 14, 2026

Uh oh!

Uh oh!

efortin commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

efortin commented Jun 11, 2026 •

edited

Loading

efortin commented Jun 14, 2026 •

edited

Loading