Skip to content

[AsyncGRPO] AsyncGRPO add_response_schema not gated on self.tools (diverges from GRPOTrainer) #5742

@aazizyan

Description

@aazizyan

Reproduction

Problem

AsyncRolloutWorker.__init__ calls add_response_schema unconditionally; GRPOTrainer.__init__ gates the same call on if self.tools and .... Result: an AsyncGRPOTrainer user with a tokenizer whose chat template isn't bundled in TRL's dispatch arms (Mistral, Gemma, Phi, Cohere, custom finetune, anything with chat_template=None, …) hits ValueError at init even when they pass tools=None. The same setup on GRPOTrainer succeeds.

AsyncGRPO is in experimental/, so the user count is small — but anyone trying it with a non-bundled tokenizer hits this at the very first init step, with no workaround short of patching TRL or pre-attaching response_schema manually.

trl/experimental/async_grpo/async_rollout_worker.py:169-176:

self.tokenizer = processing_class
self.tokenizer = add_response_schema(self.tokenizer)            # ungated
if self.tools and not is_chat_template_prefix_preserving(self.tokenizer):
    self.chat_template = get_training_chat_template(self.tokenizer)
else:
    self.chat_template = None

trl/trainer/grpo_trainer.py:540-547:

if self.tools and getattr(self._tokenizer, "response_schema", None) is None:
    processing_class = add_response_schema(processing_class)
if self.tools and not is_chat_template_prefix_preserving(processing_class):
    self.chat_template = get_training_chat_template(processing_class)
else:
    self.chat_template = None

Note the asymmetry within the worker's own block: line 173 is gated on self.tools, line 170 is not.

Reproduction

vLLM not required; the bug fires before the worker's vLLM check at line 112 matters and before any HTTP call. Mocks bypass setup that runs after line 170.

from unittest.mock import patch
from datasets import Dataset
from transformers import AutoTokenizer
from trl.experimental.async_grpo.async_rollout_worker import AsyncRolloutWorker

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
tokenizer.chat_template = None  # any template not in the dispatch arms

dataset = Dataset.from_list([{"prompt": [{"role": "user", "content": "Hi"}]}])

with (
    patch("trl.experimental.async_grpo.async_rollout_worker.is_vllm_available", return_value=True),
    patch.object(AsyncRolloutWorker, "_wait_for_server_ready_sync"),
    patch.object(AsyncRolloutWorker, "_init_weight_transfer"),
):
    AsyncRolloutWorker(
        model_name="Qwen/Qwen3-0.6B",
        dataset=dataset,
        reward_funcs=[lambda **kw: [0.0]],
        processing_class=tokenizer,
        tools=None,  # explicitly no tools — schema call should be skipped
    )
File ".../trl/experimental/async_grpo/async_rollout_worker.py", line 170, in __init__
    self.tokenizer = add_response_schema(self.tokenizer)
File ".../trl/chat_template_utils.py", line 410, in add_response_schema
    raise ValueError(
ValueError: Unrecognized chat template, failed to add response schema. Please manually set the response schema on the tokenizer or processor. See the Transformers [docs](https://huggingface.co/docs/transformers/main/en/chat_response_parsing#response-parsing) for more details on response parsing.

Verified against main @ f3f04b99.

Suggested fix

Mirror grpo_trainer.py:540 so async and sync GRPO stay consistent:

if self.tools and getattr(self.tokenizer, "response_schema", None) is None:
    self.tokenizer = add_response_schema(self.tokenizer)

Also: the worker has no equivalent of grpo_trainer.py:488's supports_tool_calling(processing_class) pre-check, so users passing tools=[...] with an unsupported template fail late in generation rather than at init. Same source — the port from grpo_trainer.py dropped two checks, not one. Worth folding into the same fix or filing separately, your call.

Related

Issue #5498 reported a related symptom (AsyncRolloutWorker reloading the tokenizer from model_name, discarding user-applied processing_class). PR #5538 fixed that — the worker now respects the provided processing_class. But the schema call itself remained ungated, so users who don't pre-apply add_response_schema still hit ValueError at init when their template isn't in TRL's dispatch arms.

System Info

  • Platform: macOS-26.3.1-arm64-arm-64bit
  • Python version: 3.10.19
  • TRL version: 1.4.0.dev0+f3f04b9
  • PyTorch version: 2.11.0
  • accelerator(s): MPS
  • Transformers version: 5.8.0
  • Accelerate version: 1.13.0
  • Accelerate config: not found
  • Datasets version: 4.8.5
  • HF Hub version: 1.14.0
  • bitsandbytes version: 0.49.2
  • DeepSpeed version: 0.18.9
  • Liger-Kernel version: not installed
  • PEFT version: 0.19.1
  • vLLM version: not installed

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions