[AsyncGRPO] `AsyncGRPO` `add_response_schema` not gated on `self.tools` (diverges from `GRPOTrainer`)

### Reproduction

### Problem

`AsyncRolloutWorker.__init__` calls `add_response_schema` unconditionally; `GRPOTrainer.__init__` gates the same call on `if self.tools and ...`. Result: an `AsyncGRPOTrainer` user with a tokenizer whose chat template isn't bundled in TRL's dispatch arms (Mistral, Gemma, Phi, Cohere, custom finetune, anything with `chat_template=None`, …) hits `ValueError` at init even when they pass `tools=None`. The same setup on `GRPOTrainer` succeeds.

AsyncGRPO is in `experimental/`, so the user count is small — but anyone trying it with a non-bundled tokenizer hits this at the very first init step, with no workaround short of patching TRL or pre-attaching `response_schema` manually.

[`trl/experimental/async_grpo/async_rollout_worker.py:169-176`](https://github.com/huggingface/trl/blob/main/trl/experimental/async_grpo/async_rollout_worker.py#L169-L176):

```python
self.tokenizer = processing_class
self.tokenizer = add_response_schema(self.tokenizer)            # ungated
if self.tools and not is_chat_template_prefix_preserving(self.tokenizer):
    self.chat_template = get_training_chat_template(self.tokenizer)
else:
    self.chat_template = None
```

[`trl/trainer/grpo_trainer.py:540-547`](https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py#L540-L547):

```python
if self.tools and getattr(self._tokenizer, "response_schema", None) is None:
    processing_class = add_response_schema(processing_class)
if self.tools and not is_chat_template_prefix_preserving(processing_class):
    self.chat_template = get_training_chat_template(processing_class)
else:
    self.chat_template = None
```

Note the asymmetry within the worker's own block: line 173 *is* gated on `self.tools`, line 170 is not.

### Reproduction

vLLM not required; the bug fires before the worker's vLLM check at line 112 matters and before any HTTP call. Mocks bypass setup that runs *after* line 170.

```python
from unittest.mock import patch
from datasets import Dataset
from transformers import AutoTokenizer
from trl.experimental.async_grpo.async_rollout_worker import AsyncRolloutWorker

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
tokenizer.chat_template = None  # any template not in the dispatch arms

dataset = Dataset.from_list([{"prompt": [{"role": "user", "content": "Hi"}]}])

with (
    patch("trl.experimental.async_grpo.async_rollout_worker.is_vllm_available", return_value=True),
    patch.object(AsyncRolloutWorker, "_wait_for_server_ready_sync"),
    patch.object(AsyncRolloutWorker, "_init_weight_transfer"),
):
    AsyncRolloutWorker(
        model_name="Qwen/Qwen3-0.6B",
        dataset=dataset,
        reward_funcs=[lambda **kw: [0.0]],
        processing_class=tokenizer,
        tools=None,  # explicitly no tools — schema call should be skipped
    )
```

```
File ".../trl/experimental/async_grpo/async_rollout_worker.py", line 170, in __init__
    self.tokenizer = add_response_schema(self.tokenizer)
File ".../trl/chat_template_utils.py", line 410, in add_response_schema
    raise ValueError(
ValueError: Unrecognized chat template, failed to add response schema. Please manually set the response schema on the tokenizer or processor. See the Transformers [docs](https://huggingface.co/docs/transformers/main/en/chat_response_parsing#response-parsing) for more details on response parsing.
```

Verified against `main` @ `f3f04b99`.

### Suggested fix

Mirror `grpo_trainer.py:540` so async and sync GRPO stay consistent:

```python
if self.tools and getattr(self.tokenizer, "response_schema", None) is None:
    self.tokenizer = add_response_schema(self.tokenizer)
```

Also: the worker has no equivalent of `grpo_trainer.py:488`'s `supports_tool_calling(processing_class)` pre-check, so users passing `tools=[...]` with an unsupported template fail late in generation rather than at init. Same source — the port from `grpo_trainer.py` dropped two checks, not one. Worth folding into the same fix or filing separately, your call.

### Related

Issue #5498 reported a related symptom (`AsyncRolloutWorker` reloading the tokenizer from `model_name`, discarding user-applied `processing_class`). PR #5538 fixed that — the worker now respects the provided `processing_class`. But the schema call itself remained ungated, so users who don't pre-apply `add_response_schema` still hit `ValueError` at init when their template isn't in TRL's dispatch arms.

### System Info

- Platform: macOS-26.3.1-arm64-arm-64bit
- Python version: 3.10.19
- TRL version: 1.4.0.dev0+f3f04b9
- PyTorch version: 2.11.0
- accelerator(s): MPS
- Transformers version: 5.8.0
- Accelerate version: 1.13.0
- Accelerate config: not found
- Datasets version: 4.8.5
- HF Hub version: 1.14.0
- bitsandbytes version: 0.49.2
- DeepSpeed version: 0.18.9
- Liger-Kernel version: not installed
- PEFT version: 0.19.1
- vLLM version: not installed

### Checklist

- [x] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [x] I have included my system information
- [x] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AsyncGRPO] `AsyncGRPO` `add_response_schema` not gated on `self.tools` (diverges from `GRPOTrainer`) #5742

Reproduction

Problem

Reproduction

Suggested fix

Related

System Info

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[AsyncGRPO] AsyncGRPO add_response_schema not gated on self.tools (diverges from GRPOTrainer) #5742

Description

Reproduction

Problem

Reproduction

Suggested fix

Related

System Info

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[AsyncGRPO] `AsyncGRPO` `add_response_schema` not gated on `self.tools` (diverges from `GRPOTrainer`) #5742