Reproduction
Problem
AsyncRolloutWorker.__init__ calls add_response_schema unconditionally; GRPOTrainer.__init__ gates the same call on if self.tools and .... Result: an AsyncGRPOTrainer user with a tokenizer whose chat template isn't bundled in TRL's dispatch arms (Mistral, Gemma, Phi, Cohere, custom finetune, anything with chat_template=None, …) hits ValueError at init even when they pass tools=None. The same setup on GRPOTrainer succeeds.
AsyncGRPO is in experimental/, so the user count is small — but anyone trying it with a non-bundled tokenizer hits this at the very first init step, with no workaround short of patching TRL or pre-attaching response_schema manually.
trl/experimental/async_grpo/async_rollout_worker.py:169-176:
self.tokenizer = processing_class
self.tokenizer = add_response_schema(self.tokenizer) # ungated
if self.tools and not is_chat_template_prefix_preserving(self.tokenizer):
self.chat_template = get_training_chat_template(self.tokenizer)
else:
self.chat_template = None
trl/trainer/grpo_trainer.py:540-547:
if self.tools and getattr(self._tokenizer, "response_schema", None) is None:
processing_class = add_response_schema(processing_class)
if self.tools and not is_chat_template_prefix_preserving(processing_class):
self.chat_template = get_training_chat_template(processing_class)
else:
self.chat_template = None
Note the asymmetry within the worker's own block: line 173 is gated on self.tools, line 170 is not.
Reproduction
vLLM not required; the bug fires before the worker's vLLM check at line 112 matters and before any HTTP call. Mocks bypass setup that runs after line 170.
from unittest.mock import patch
from datasets import Dataset
from transformers import AutoTokenizer
from trl.experimental.async_grpo.async_rollout_worker import AsyncRolloutWorker
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
tokenizer.chat_template = None # any template not in the dispatch arms
dataset = Dataset.from_list([{"prompt": [{"role": "user", "content": "Hi"}]}])
with (
patch("trl.experimental.async_grpo.async_rollout_worker.is_vllm_available", return_value=True),
patch.object(AsyncRolloutWorker, "_wait_for_server_ready_sync"),
patch.object(AsyncRolloutWorker, "_init_weight_transfer"),
):
AsyncRolloutWorker(
model_name="Qwen/Qwen3-0.6B",
dataset=dataset,
reward_funcs=[lambda **kw: [0.0]],
processing_class=tokenizer,
tools=None, # explicitly no tools — schema call should be skipped
)
File ".../trl/experimental/async_grpo/async_rollout_worker.py", line 170, in __init__
self.tokenizer = add_response_schema(self.tokenizer)
File ".../trl/chat_template_utils.py", line 410, in add_response_schema
raise ValueError(
ValueError: Unrecognized chat template, failed to add response schema. Please manually set the response schema on the tokenizer or processor. See the Transformers [docs](https://huggingface.co/docs/transformers/main/en/chat_response_parsing#response-parsing) for more details on response parsing.
Verified against main @ f3f04b99.
Suggested fix
Mirror grpo_trainer.py:540 so async and sync GRPO stay consistent:
if self.tools and getattr(self.tokenizer, "response_schema", None) is None:
self.tokenizer = add_response_schema(self.tokenizer)
Also: the worker has no equivalent of grpo_trainer.py:488's supports_tool_calling(processing_class) pre-check, so users passing tools=[...] with an unsupported template fail late in generation rather than at init. Same source — the port from grpo_trainer.py dropped two checks, not one. Worth folding into the same fix or filing separately, your call.
Related
Issue #5498 reported a related symptom (AsyncRolloutWorker reloading the tokenizer from model_name, discarding user-applied processing_class). PR #5538 fixed that — the worker now respects the provided processing_class. But the schema call itself remained ungated, so users who don't pre-apply add_response_schema still hit ValueError at init when their template isn't in TRL's dispatch arms.
System Info
- Platform: macOS-26.3.1-arm64-arm-64bit
- Python version: 3.10.19
- TRL version: 1.4.0.dev0+f3f04b9
- PyTorch version: 2.11.0
- accelerator(s): MPS
- Transformers version: 5.8.0
- Accelerate version: 1.13.0
- Accelerate config: not found
- Datasets version: 4.8.5
- HF Hub version: 1.14.0
- bitsandbytes version: 0.49.2
- DeepSpeed version: 0.18.9
- Liger-Kernel version: not installed
- PEFT version: 0.19.1
- vLLM version: not installed
Checklist
Reproduction
Problem
AsyncRolloutWorker.__init__callsadd_response_schemaunconditionally;GRPOTrainer.__init__gates the same call onif self.tools and .... Result: anAsyncGRPOTraineruser with a tokenizer whose chat template isn't bundled in TRL's dispatch arms (Mistral, Gemma, Phi, Cohere, custom finetune, anything withchat_template=None, …) hitsValueErrorat init even when they passtools=None. The same setup onGRPOTrainersucceeds.AsyncGRPO is in
experimental/, so the user count is small — but anyone trying it with a non-bundled tokenizer hits this at the very first init step, with no workaround short of patching TRL or pre-attachingresponse_schemamanually.trl/experimental/async_grpo/async_rollout_worker.py:169-176:trl/trainer/grpo_trainer.py:540-547:Note the asymmetry within the worker's own block: line 173 is gated on
self.tools, line 170 is not.Reproduction
vLLM not required; the bug fires before the worker's vLLM check at line 112 matters and before any HTTP call. Mocks bypass setup that runs after line 170.
Verified against
main@f3f04b99.Suggested fix
Mirror
grpo_trainer.py:540so async and sync GRPO stay consistent:Also: the worker has no equivalent of
grpo_trainer.py:488'ssupports_tool_calling(processing_class)pre-check, so users passingtools=[...]with an unsupported template fail late in generation rather than at init. Same source — the port fromgrpo_trainer.pydropped two checks, not one. Worth folding into the same fix or filing separately, your call.Related
Issue #5498 reported a related symptom (
AsyncRolloutWorkerreloading the tokenizer frommodel_name, discarding user-appliedprocessing_class). PR #5538 fixed that — the worker now respects the providedprocessing_class. But the schema call itself remained ungated, so users who don't pre-applyadd_response_schemastill hitValueErrorat init when their template isn't in TRL's dispatch arms.System Info
Checklist