-
Notifications
You must be signed in to change notification settings - Fork 1
Description
pip environment:
Package Version Editable project location
aiohappyeyeballs 2.6.1
aiohttp 3.13.1
aiosignal 1.4.0
annotated-types 0.7.0
anyio 4.11.0
astor 0.8.1
attrs 25.4.0
backoff 2.2.1
blake3 1.0.8
blessed 1.22.0
cachetools 6.2.1
cbor2 5.7.0
certifi 2025.10.5
cffi 2.0.0
cfgv 3.4.0
charset-normalizer 3.4.4
click 8.1.8
cloudpickle 3.1.1
compressed-tensors 0.11.0
cupy-cuda12x 13.6.0
datasets 4.2.0
depyf 0.19.0
dill 0.4.0
diskcache 5.6.3
distlib 0.4.0
distro 1.9.0
dnspython 2.8.0
einops 0.8.1
email-validator 2.3.0
et_xmlfile 2.0.0
eval_type_backport 0.2.2
evaluate 0.4.6
fastapi 0.119.0
fastapi-cli 0.0.13
fastapi-cloud-cli 0.3.1
fastrlock 0.8.3
filelock 3.20.0
flame 0.2.0 /content/FLaME
frozendict 2.4.6
frozenlist 1.8.0
fsspec 2025.9.0
gguf 0.17.1
gpustat 1.1.1
h11 0.16.0
hf-xet 1.1.10
httpcore 1.0.9
httptools 0.7.1
httpx 0.28.1
huggingface-hub 0.35.3
identify 2.6.15
idna 3.11
importlib_metadata 8.7.0
iniconfig 2.3.0
interegular 0.3.3
Jinja2 3.1.6
jiter 0.11.1
joblib 1.5.2
jsonschema 4.25.1
jsonschema-specifications 2025.9.1
lark 1.2.2
litellm 1.67.1
llguidance 0.7.30
llvmlite 0.44.0
lm-format-enforcer 0.11.3
markdown-it-py 4.0.0
MarkupSafe 3.0.3
mdurl 0.1.2
mistral_common 1.8.5
mpmath 1.3.0
msgpack 1.1.2
msgspec 0.19.0
multidict 6.7.0
multiprocess 0.70.16
networkx 3.5
ninja 1.13.0
nodeenv 1.9.1
numba 0.61.2
numpy 2.2.6
nvidia-cublas-cu12 12.8.4.1
nvidia-cuda-cupti-cu12 12.8.90
nvidia-cuda-nvrtc-cu12 12.8.93
nvidia-cuda-runtime-cu12 12.8.90
nvidia-cudnn-cu12 9.10.2.21
nvidia-cufft-cu12 11.3.3.83
nvidia-cufile-cu12 1.13.1.3
nvidia-curand-cu12 10.3.9.90
nvidia-cusolver-cu12 11.7.3.90
nvidia-cusparse-cu12 12.5.8.93
nvidia-cusparselt-cu12 0.7.1
nvidia-ml-py 13.580.82
nvidia-nccl-cu12 2.27.3
nvidia-nvjitlink-cu12 12.8.93
nvidia-nvtx-cu12 12.8.90
nvitop 1.5.3
openai 2.5.0
openai-harmony 0.0.4
opencv-python-headless 4.12.0.88
openpyxl 3.1.5
outlines_core 0.2.11
packaging 25.0
pandas 2.3.3
partial-json-parser 0.2.1.1.post6
pillow 11.3.0
pip 25.2
platformdirs 4.5.0
pluggy 1.6.0
pre_commit 4.3.0
prometheus_client 0.23.1
prometheus-fastapi-instrumentator 7.1.0
propcache 0.4.1
protobuf 6.33.0
psutil 7.1.1
py-cpuinfo 9.0.0
pyarrow 21.0.0
pybase64 1.4.2
pycountry 24.6.1
pycparser 2.23
pydantic 2.12.3
pydantic_core 2.41.4
pydantic-extra-types 2.10.6
Pygments 2.19.2
pytest 8.4.2
python-dateutil 2.9.0.post0
python-dotenv 1.1.1
python-json-logger 4.0.0
python-multipart 0.0.20
pytz 2025.2
PyYAML 6.0.3
pyzmq 27.1.0
ray 2.50.1
referencing 0.37.0
regex 2025.9.18
requests 2.32.5
rich 14.2.0
rich-toolkit 0.15.1
rignore 0.7.1
rpds-py 0.27.1
ruff 0.14.1
safetensors 0.6.2
scikit-learn 1.7.2
scipy 1.16.2
sentencepiece 0.2.1
sentry-sdk 2.42.0
setproctitle 1.3.7
setuptools 80.9.0
shellingham 1.5.4
six 1.17.0
sniffio 1.3.1
soundfile 0.13.1
soxr 1.0.0
starlette 0.48.0
sympy 1.14.0
tabulate 0.9.0
tenacity 9.1.2
threadpoolctl 3.6.0
tiktoken 0.12.0
together 1.5.26
tokenizers 0.22.1
torch 2.8.0
torchaudio 2.8.0
torchvision 0.23.0
tqdm 4.67.1
transformers 4.57.1
triton 3.4.0
typer 0.15.4
typing_extensions 4.15.0
typing-inspection 0.4.2
tzdata 2025.2
urllib3 2.5.0
uvicorn 0.38.0
uvloop 0.22.1
virtualenv 20.35.3
vllm 0.11.0
watchfiles 1.1.1
wcwidth 0.2.14
websockets 15.0.1
wheel 0.45.1
xformers 0.0.32.post1
xgrammar 0.1.25
xxhash 3.6.0
yarl 1.22.0
zipp 3.23.0
I modified the numclain.yaml file as follows:
model: "vllm//WiroAI/WiroAI-Finance-Qwen-7B"
tasks:
- numclaim
max_tokens: 128
temperature: 0.0
top_p: 0.9
top_k: null
repetition_penalty: 1.0
batch_size: 50
prompt_format: zero_shot
# Logging configuration
logging:
level: "INFO" # Global logging level (Options: DEBUG, INFO, WARNING, ERROR, CRITICAL)
console:
enabled: true
level: "DEBUG" # Console output level
file:
enabled: true
level: "DEBUG" # File output level
max_size_mb: 10 # Maximum file size in MB
backup_count: 5 # Number of backup files to keep
components:
litellm: "DEBUG" # Control litellm verbosity (WARNING suppresses most output)
batch_utils: "DEBUG"
inference: "DEBUG" # Control inference module verbosity
evaluation: "INFO" # Control evaluation module verbosityrun inference:
python main.py --config configs/numclaim.yaml --mode inferenceAn error occurred:
2025-10-20 18:17:56,619 - inference.numclaim - ERROR - Batch 1 failed: Unexpected keyword argument 'stream'
i guess that vllm=0.11.0 is incompatible with litellm=1.67.1. Specifically, vllm=0.11.0 does not support the stream parameter for stream response.
One solution is to add the following code in litellm.batch_completion.main.py:
if custom_llm_provider == "vllm":
optional_params = get_optional_params(
functions=functions,
function_call=function_call,
temperature=temperature,
top_p=top_p,
n=n,
stream=stream or False,
stop=stop,
max_tokens=max_tokens,
presence_penalty=presence_penalty,
frequency_penalty=frequency_penalty,
logit_bias=logit_bias,
user=user,
# params to identify the model
model=model,
custom_llm_provider=custom_llm_provider,
)
optional_params.pop("stream") # Add this sentence to remove the "stream" param
results = vllm_handler.batch_completions(
model=model,
messages=batch_messages,
custom_prompt_dict=litellm.custom_prompt_dict,
optional_params=optional_params,
)