Skip to content

Incompatible with vllm=0.11.0 version #178

@ArkNightmaster

Description

@ArkNightmaster

pip environment:

Package Version Editable project location


aiohappyeyeballs 2.6.1
aiohttp 3.13.1
aiosignal 1.4.0
annotated-types 0.7.0
anyio 4.11.0
astor 0.8.1
attrs 25.4.0
backoff 2.2.1
blake3 1.0.8
blessed 1.22.0
cachetools 6.2.1
cbor2 5.7.0
certifi 2025.10.5
cffi 2.0.0
cfgv 3.4.0
charset-normalizer 3.4.4
click 8.1.8
cloudpickle 3.1.1
compressed-tensors 0.11.0
cupy-cuda12x 13.6.0
datasets 4.2.0
depyf 0.19.0
dill 0.4.0
diskcache 5.6.3
distlib 0.4.0
distro 1.9.0
dnspython 2.8.0
einops 0.8.1
email-validator 2.3.0
et_xmlfile 2.0.0
eval_type_backport 0.2.2
evaluate 0.4.6
fastapi 0.119.0
fastapi-cli 0.0.13
fastapi-cloud-cli 0.3.1
fastrlock 0.8.3
filelock 3.20.0
flame 0.2.0 /content/FLaME
frozendict 2.4.6
frozenlist 1.8.0
fsspec 2025.9.0
gguf 0.17.1
gpustat 1.1.1
h11 0.16.0
hf-xet 1.1.10
httpcore 1.0.9
httptools 0.7.1
httpx 0.28.1
huggingface-hub 0.35.3
identify 2.6.15
idna 3.11
importlib_metadata 8.7.0
iniconfig 2.3.0
interegular 0.3.3
Jinja2 3.1.6
jiter 0.11.1
joblib 1.5.2
jsonschema 4.25.1
jsonschema-specifications 2025.9.1
lark 1.2.2
litellm 1.67.1
llguidance 0.7.30
llvmlite 0.44.0
lm-format-enforcer 0.11.3
markdown-it-py 4.0.0
MarkupSafe 3.0.3
mdurl 0.1.2
mistral_common 1.8.5
mpmath 1.3.0
msgpack 1.1.2
msgspec 0.19.0
multidict 6.7.0
multiprocess 0.70.16
networkx 3.5
ninja 1.13.0
nodeenv 1.9.1
numba 0.61.2
numpy 2.2.6
nvidia-cublas-cu12 12.8.4.1
nvidia-cuda-cupti-cu12 12.8.90
nvidia-cuda-nvrtc-cu12 12.8.93
nvidia-cuda-runtime-cu12 12.8.90
nvidia-cudnn-cu12 9.10.2.21
nvidia-cufft-cu12 11.3.3.83
nvidia-cufile-cu12 1.13.1.3
nvidia-curand-cu12 10.3.9.90
nvidia-cusolver-cu12 11.7.3.90
nvidia-cusparse-cu12 12.5.8.93
nvidia-cusparselt-cu12 0.7.1
nvidia-ml-py 13.580.82
nvidia-nccl-cu12 2.27.3
nvidia-nvjitlink-cu12 12.8.93
nvidia-nvtx-cu12 12.8.90
nvitop 1.5.3
openai 2.5.0
openai-harmony 0.0.4
opencv-python-headless 4.12.0.88
openpyxl 3.1.5
outlines_core 0.2.11
packaging 25.0
pandas 2.3.3
partial-json-parser 0.2.1.1.post6
pillow 11.3.0
pip 25.2
platformdirs 4.5.0
pluggy 1.6.0
pre_commit 4.3.0
prometheus_client 0.23.1
prometheus-fastapi-instrumentator 7.1.0
propcache 0.4.1
protobuf 6.33.0
psutil 7.1.1
py-cpuinfo 9.0.0
pyarrow 21.0.0
pybase64 1.4.2
pycountry 24.6.1
pycparser 2.23
pydantic 2.12.3
pydantic_core 2.41.4
pydantic-extra-types 2.10.6
Pygments 2.19.2
pytest 8.4.2
python-dateutil 2.9.0.post0
python-dotenv 1.1.1
python-json-logger 4.0.0
python-multipart 0.0.20
pytz 2025.2
PyYAML 6.0.3
pyzmq 27.1.0
ray 2.50.1
referencing 0.37.0
regex 2025.9.18
requests 2.32.5
rich 14.2.0
rich-toolkit 0.15.1
rignore 0.7.1
rpds-py 0.27.1
ruff 0.14.1
safetensors 0.6.2
scikit-learn 1.7.2
scipy 1.16.2
sentencepiece 0.2.1
sentry-sdk 2.42.0
setproctitle 1.3.7
setuptools 80.9.0
shellingham 1.5.4
six 1.17.0
sniffio 1.3.1
soundfile 0.13.1
soxr 1.0.0
starlette 0.48.0
sympy 1.14.0
tabulate 0.9.0
tenacity 9.1.2
threadpoolctl 3.6.0
tiktoken 0.12.0
together 1.5.26
tokenizers 0.22.1
torch 2.8.0
torchaudio 2.8.0
torchvision 0.23.0
tqdm 4.67.1
transformers 4.57.1
triton 3.4.0
typer 0.15.4
typing_extensions 4.15.0
typing-inspection 0.4.2
tzdata 2025.2
urllib3 2.5.0
uvicorn 0.38.0
uvloop 0.22.1
virtualenv 20.35.3
vllm 0.11.0
watchfiles 1.1.1
wcwidth 0.2.14
websockets 15.0.1
wheel 0.45.1
xformers 0.0.32.post1
xgrammar 0.1.25
xxhash 3.6.0
yarl 1.22.0
zipp 3.23.0

I modified the numclain.yaml file as follows:

model: "vllm//WiroAI/WiroAI-Finance-Qwen-7B"
tasks:
  - numclaim
max_tokens: 128
temperature: 0.0
top_p: 0.9
top_k: null
repetition_penalty: 1.0
batch_size: 50
prompt_format: zero_shot

# Logging configuration
logging:
  level: "INFO"            # Global logging level (Options: DEBUG, INFO, WARNING, ERROR, CRITICAL)
  console:
    enabled: true
    level: "DEBUG"          # Console output level
  file:
    enabled: true
    level: "DEBUG"         # File output level
    max_size_mb: 10        # Maximum file size in MB
    backup_count: 5        # Number of backup files to keep
  components:
    litellm: "DEBUG"     # Control litellm verbosity (WARNING suppresses most output)
    batch_utils: "DEBUG"
    inference: "DEBUG"      # Control inference module verbosity
    evaluation: "INFO"     # Control evaluation module verbosity

run inference:

python main.py --config configs/numclaim.yaml --mode inference

An error occurred:

2025-10-20 18:17:56,619 - inference.numclaim - ERROR - Batch 1 failed: Unexpected keyword argument 'stream'

i guess that vllm=0.11.0 is incompatible with litellm=1.67.1. Specifically, vllm=0.11.0 does not support the stream parameter for stream response.

One solution is to add the following code in litellm.batch_completion.main.py:

    if custom_llm_provider == "vllm":
        optional_params = get_optional_params(
            functions=functions,
            function_call=function_call,
            temperature=temperature,
            top_p=top_p,
            n=n,
            stream=stream or False,
            stop=stop,
            max_tokens=max_tokens,
            presence_penalty=presence_penalty,
            frequency_penalty=frequency_penalty,
            logit_bias=logit_bias,
            user=user,
            # params to identify the model
            model=model,
            custom_llm_provider=custom_llm_provider,
        )
        optional_params.pop("stream")  # Add this sentence to remove the "stream" param
        results = vllm_handler.batch_completions(
            model=model,
            messages=batch_messages,
            custom_prompt_dict=litellm.custom_prompt_dict,
            optional_params=optional_params,
        )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions