Difference in Behaviour between SkyRL's inference engine and VLLM

Hi there,

I observed that there are some discrepancies between SkyRL's inference engine that uses VLLM and directly using VLLM through its `serve` functionality. The main issue I mostly concerned about is that the `reasoning_parser` argument does not seemed to be utilized and so the thinking content from Qwen3 is not parsed into `reasoning_content` when using SkyRL. I wrote [inference_engine.py](https://github.com/user-attachments/files/24263611/inference_engine.py) which is adapted from `skyrl-train/skyrl_train/entrypoints/main_base.py` that emulates the generator during rollout.



### Using VLLM

```
uv run vllm serve Qwen/Qwen3-32B \
    --tensor_parallel_size 2 \
    --port 8080 \
    --reasoning_parser qwen3 \
    --enable-auto-tool-choice \
    --tool-call-parser hermes
```

### Using SkyRL's Inference Engine

```
uv run inference_engine.py \
    --model Qwen/Qwen3-32B \
    --num-inference-engines 1 \
    --tensor-parallel-size 2
```

### Test case

```
import os
import litellm

client = litellm.completion(
  api_key="sk-xxx",
  base_url="http://0.0.0.0:8080/v1/",
  model="litellm_proxy/Qwen/Qwen3-32B",
  messages = [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Who won the world series in 2020?"},
      {"role": "assistant", "content": "<think>\nThe user is asking about the winner of the 2020 World Series. I need to recall that the Los Angeles Dodgers won the World Series in 2020.\n</think> The Los Angeles Dodgers won the World Series in 2020."},
      {"role": "user", "content": "Where was it played?"},
      {"role": "assistant", "content": "<think>\nThe user is asking about the location of the 2020 World Series. I need to recall that it was played at Globe Life Field in Arlington, Texas.</think> The 2020 World Series was played at Globe Life Field in Arlington, Texas."},
      {"role": "user", "content": "What was the final score of game 7?"},
  ],
)
```

### Results from VLLM

`ModelResponse(id='chatcmpl-7d42ee2161b54a1e9402659c23747092', created=1766168370, model='litellm_proxy/Qwen/Qwen3-32B', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content='\n\nThe 2020 World Series did **not** go to a Game 7. The **Los Angeles Dodgers defeated the Tampa Bay Rays in six games** to win the championship. The final game (Game 6) ended with a **3-1** victory for the Dodgers at Globe Life Field in Arlington, Texas. This was the first time since 1960 that a World Series concluded in six games. \n\nThe 2020 season was shortened to 60 games due to the pandemic, and the World Series was also limited to a maximum of six games (instead of the usual seven) to reduce the risk of further delays.', role='assistant', tool_calls=None, function_call=None, reasoning_content="\nOkay, the user is asking for the final score of Game 7 of the 2020 World Series. Let me recall the details.\n\nI know the Los Angeles Dodgers won the World Series in 2020. The series was between the Dodgers and the Tampa Bay Rays. They played six games, not seven. Wait, did they go to a Game 7? Let me check that. The 2020 World Series was a best-of-seven, and the Dodgers won in six games. So there was no Game 7. The final game was Game 6. The user might be confused because some series go to seven games, but in 2020, it was six.\n\nWait, let me verify this. The 2020 season was shortened due to the pandemic, so the World Series was also shortened to a maximum of six games. The Dodgers and Rays played six games, and the Dodgers won Game 6. The score was 3-1 in favor of the Dodgers. So the user is asking about Game 7, but that didn't happen. I need to clarify that there was no Game 7 and provide the correct score from Game 6.\n\nI should also mention the shortened season as context. The user might not be aware that the 2020 season was different. So the answer should correct the assumption about a Game 7 and provide the actual details of Game 6.\n", provider_specific_fields={'refusal': None, 'reasoning_content': "\nOkay, the user is asking for the final score of Game 7 of the 2020 World Series. Let me recall the details.\n\nI know the Los Angeles Dodgers won the World Series in 2020. The series was between the Dodgers and the Tampa Bay Rays. They played six games, not seven. Wait, did they go to a Game 7? Let me check that. The 2020 World Series was a best-of-seven, and the Dodgers won in six games. So there was no Game 7. The final game was Game 6. The user might be confused because some series go to seven games, but in 2020, it was six.\n\nWait, let me verify this. The 2020 season was shortened due to the pandemic, so the World Series was also shortened to a maximum of six games. The Dodgers and Rays played six games, and the Dodgers won Game 6. The score was 3-1 in favor of the Dodgers. So the user is asking about Game 7, but that didn't happen. I need to clarify that there was no Game 7 and provide the correct score from Game 6.\n\nI should also mention the shortened season as context. The user might not be aware that the 2020 season was different. So the answer should correct the assumption about a Game 7 and provide the actual details of Game 6.\n"}), provider_specific_fields={'stop_reason': None, 'token_ids': None})], usage=Usage(completion_tokens=436, prompt_tokens=100, total_tokens=536, completion_tokens_details=None, prompt_tokens_details=None), service_tier=None, prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)`

#### Observation
Text between `<think>` and `</think>` are placed in `reasoning_content`

### Results from SkyRL's Inference Engine

`ModelResponse(id='chatcmpl-0d29b8dd0b4742abb868d5ef0dbbcdb1', created=1766171414, model='litellm_proxy/Qwen/Qwen3-32B', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content="<think>\nOkay, the user is asking for the final score of Game 7 in the 2020 World Series. Let me recall the details. The 2020 World Series was between the Los Angeles Dodgers and the Tampa Bay Rays. I remember it was a six-game series, not seven games. Wait, the user mentioned Game 7, but the actual series ended in six games. That might be a point of confusion.\n\nSo, the user might have thought the series went to seven games, but in reality, the Dodgers won in six games. Let me confirm the exact scores. The final game was Game 6, which the Dodgers won 2-0. The key points in that game were the starting pitchers: Tony Gonsolin for the Dodgers and Charlie Morton for the Rays. The Rays had a chance to tie it in the 10th inning but couldn't, and the Dodgers' pitching held on.\n\nI should make sure to clarify that there was no Game 7 because the series ended in six games. The user might have been under the impression that it went to seven games, so explaining the actual outcome is important. Also, mentioning the score of Game 6 and the key players involved would be helpful. Let me double-check the dates and the scores to ensure accuracy. Yes, Game 6 was on October 27, 2020, with the Dodgers winning 2-0. The Rays scored one run in the 10th, but it wasn't enough. The Dodgers' victory secured their first World Series title since 1988. \n\nI should present this information clearly, noting that there was no Game 7, the score of Game 6, and the key moments. Making sure to correct the user's possible misunderstanding about the number of games while providing the correct details will be the best approach here.\n</think>\n\nThe 2020 World Series did not have a Game 7. The **Los Angeles Dodgers** defeated the **Tampa Bay Rays in six games**, winning the series **2-0** in **Game 6** on **October 27, 2020**. The decisive game was played at **Globe Life Field** in Arlington, Texas. \n\n### Key details from Game 6:\n- **Dodgers' starter**: Tony Gonsolin (6 innings, 1 run allowed).\n- **Rays' starter**: Charlie Morton (5 innings, 1 run allowed).\n- The Rays scored a run in the **10th inning** (on a bases-loaded walk by Dodgers closer Kenley Jansen), but the Dodgers' defense preserved the **2-1** lead.\n- The Rays' rally fell short, and the Dodgers secured their **7th World Series title** and first since 1988.", role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}), provider_specific_fields={'stop_reason': None, 'token_ids': None})], usage=Usage(completion_tokens=583, prompt_tokens=100, total_tokens=683, completion_tokens_details=None, prompt_tokens_details=None), service_tier=None, prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)`

#### Observation
Text between `<think>` and `</think>` remains in content and is not parsed.

### What is a Potential Fix?

I am not sure where to start looking, perhaps you can point to where and I can take a look and make a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Difference in Behaviour between SkyRL's inference engine and VLLM #796

Using VLLM

Using SkyRL's Inference Engine

Test case

Results from VLLM

Observation

Results from SkyRL's Inference Engine

Observation

What is a Potential Fix?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Difference in Behaviour between SkyRL's inference engine and VLLM #796

Description

Using VLLM

Using SkyRL's Inference Engine

Test case

Results from VLLM

Observation

Results from SkyRL's Inference Engine

Observation

What is a Potential Fix?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions