fix(scheduler): keep logits processors aligned after batch row removal by richgoodson · Pull Request #1845 · jundot/omlx

richgoodson · 2026-06-12T14:06:52Z

Symptom

Per-request logits processors (thinking budget, grammar constraints) are silently dropped depending on request order. A request with thinking_budget: 300 reasons to its natural length as if the parameter were absent, but only when the previous request in the decode batch carried no processors. The same request on a fresh server enforces the budget exactly. Grammar-constrained requests can lose their constraint the same way.

Cause

mlx-lm's GenerationBatch.filter only reindexes logits_processors when at least one row slot is non-empty:

if any(self.logits_processors):
    self.logits_processors = [self.logits_processors[idx] for idx in keep]

There is no else branch (the prompt-batch class has one: [[]] * len(keep)). After a request with no per-request processors finishes, every slot is [] (the shape omlx inserts), any() is False, and removal shrinks uids while the stale processor list survives. The next request's row then extend()s in behind its own index. I instrumented the patched _step and caught the broken state directly: uids=(1,) with processor slots ([], [ThinkingBudgetProcessor]). Row 0 reads the leftover empty slot; the real processor sits orphaned at index 1 and is never applied.

The misalignment self-heals when the affected request finishes (the orphan makes any() True again, so the next filter reindexes correctly). That makes the symptom intermittent and order-dependent, which is why it has been easy to miss.

samplers sit behind the same any() guard, but omlx always passes a real sampler object per row, so that list cannot go inert. I left it alone.

Fix

Wrap GenerationBatch.filter next to the existing _step chokepoint patch (#934/#1747): snapshot whether the list was inert before the original filter runs, and if the original guard skipped the reindex, reset the list to one empty slot per surviving row. Also normalizes a None list, which would crash the original filter's any().

Testing

New tests in tests/test_scheduler_logits_processors.py exercise the real mlx-lm filter/extend bookkeeping on bare GenerationBatch instances (no model load): the stale-list reset, the remove-then-extend alignment that reproduces the live failure, the active-processor reindex path (must not be clobbered), the None normalization, and a source-level pin on the patch installation. The behavior tests fail without the fix exactly as described (logits_processors == [[]] against uids == []).
Full suite: 5633 passed, 22 skipped, exit 0.
Live test on an M4 Pro Mini with Qwen3.6-35B-A3B-4bit, temperature 0: a no-budget chat request followed by a thinking_budget: 300 chat request produced 580 reasoning tokens before the fix (budget ignored) and 299 after. On a fresh server both builds produce 299, which is the order-dependence in one line.

Context

Found while live-testing #1844 (thinking_budget on /v1/completions, #1825). The drop reproduces on /v1/chat/completions on current main, so it is independent of that PR. The underlying bug is in mlx-lm; if you would rather fix it upstream I can write that up, but the monkey-patch follows the established #934/#1747 pattern and protects omlx on the current pin either way.

mlx-lm's GenerationBatch.filter only reindexes logits_processors when at least one row slot is non-empty. After a request with no per-request processors finishes (every slot is [], the shape omlx inserts), removal shrinks uids but leaves the stale processor list behind. The next request then extends in behind its own row index: the row reads a leftover empty slot and its real processor (thinking budget, grammar constraint) is silently never applied. Alignment self-heals when the affected request finishes, so the symptom is an intermittently ignored thinking_budget or grammar constraint that depends on request order. Wrap GenerationBatch.filter alongside the existing _step chokepoint patch (jundot#934/jundot#1747): when the original guard would have skipped the reindex, reset the list to one empty slot per surviving row. Found while live-testing jundot#1844. thinking_budget was enforced on a fresh server but ignored when the previous request in the decode batch had no processors. Reproduced on /v1/chat/completions as well, so the bug is independent of the completions forwarding change.

richgoodson · 2026-06-12T15:01:53Z

Upstream status: this is a known mlx-lm bug with an open fix, ml-explore/mlx-lm#1225 (open since April, no review yet; the maintainer closed the sibling #1230 as a duplicate and its extend fix was folded into #1225 two days ago). I commented there with our repro and measurements. When the mlx-lm pin moves past that fix, _patched_generation_batch_filter becomes a no-op and can be dropped together with the source-level pin test; the behavior tests in tests/test_scheduler_logits_processors.py pin the aligned outcome rather than the patch mechanics, so they keep passing on a fixed mlx-lm.

jundot · 2026-06-13T09:06:39Z

Thanks for the detailed repro and upstream context. I confirmed the current mlx-lm pin still has the GenerationBatch.filter stale-list behavior.

I also checked the VLM path. mlx-vlm's own GenerationBatch.filter already reindexes/clears logits_processors, and this oMLX patch wraps mlx-lm's GenerationBatch used by the normal oMLX decode path, so I don't see a VLM conflict.

I ran the logits-processor, scheduler, and MTP-adjacent tests locally, and this looks good to me. I'm going to merge it.

richgoodson mentioned this pull request Jun 12, 2026

generate.py: fix per-slot samplers/logits_processors bookkeeping in filter and extend ml-explore/mlx-lm#1225

Open

richgoodson marked this pull request as ready for review June 12, 2026 15:04

jundot merged commit d6383e5 into jundot:main Jun 13, 2026
4 checks passed

This was referenced Jun 13, 2026

fix(scheduler): realign per-row samplers and logits processors by uid #1824

Merged

Grammar and thinking_budget silently stop applying under concurrent mixed load (row misalignment after batch merge — sequel to #1798) #1823

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(scheduler): keep logits processors aligned after batch row removal#1845

fix(scheduler): keep logits processors aligned after batch row removal#1845
jundot merged 1 commit into
jundot:mainfrom
richgoodson:fix/batch-filter-logits-processor-drop

richgoodson commented Jun 12, 2026

Uh oh!

richgoodson commented Jun 12, 2026 •

edited

Loading

Uh oh!

jundot commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

richgoodson commented Jun 12, 2026

Symptom

Cause

Fix

Testing

Context

Uh oh!

richgoodson commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jundot commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

richgoodson commented Jun 12, 2026 •

edited

Loading