thinking_budget is silently ignored on /v1/completions

## Environment

- omlx main @ `6ae5142`
- macOS, Apple Silicon (M5 Pro, 64 GB)
- Model: `mlx-community/Qwen3.6-35B-A3B-nvfp4`, thinking enabled

## Symptom

`thinking_budget` works on `/v1/chat/completions` (and `budget_tokens` on `/v1/messages`), but on `/v1/completions` it is silently ignored: send it, get no error, and the model reasons to its natural length anyway.

```bash
curl -s http://localhost:1234/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3.6-35B-A3B-nvfp4",
    "prompt": "<|im_start|>user\nExplain why the sky is blue.<|im_end|>\n<|im_start|>assistant\n<think>\n",
    "max_tokens": 2500,
    "thinking_budget": 300
  }'
```

Expected: thinking bounded around 300 tokens. Actual: ~1450 thinking tokens, exactly like the same request without the parameter.

## Cause

`CompletionRequest` has no `thinking_budget` field, so Pydantic drops it at parsing, and neither completion path resolves a budget for the engine. Raw completions are otherwise a natural fit: a prompt ending with an open `<think>` already passes the scheduler's `needs_think_prefix` gate, so enforcement works as soon as the value reaches the engine.

## Fix

Up in #1821: add the field and thread it through both completion paths via the existing `_resolve_thinking_budget` helper. With it: budget 300 → 299 thinking tokens, deterministic at temperature 0 (100 → 99, 50 → 49).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

thinking_budget is silently ignored on /v1/completions #1825

Environment

Symptom

Cause

Fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

thinking_budget is silently ignored on /v1/completions #1825

Description

Environment

Symptom

Cause

Fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions