!python /content/multi_token/scripts/serve_model.py \
--model_name_or_path mistralai/Mistral-7B-Instruct-v0.1 \
--model_lora_path sshh12/Mistral-7B-LoRA-ImageBind-LLAVA \
--port 9069
Downloading shards: 100% 2/2 [05:11<00:00, 155.64s/it]
Loading checkpoint shards: 100% 2/2 [01:03<00:00, 31.83s/it]
generation_config.json: 100% 116/116 [00:00<00:00, 668kB/s]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the disk and cpu.
INFO:root:Loading projector weights for ['imagebind']
non_lora_trainables.bin: 100% 168M/168M [00:14<00:00, 11.5MB/s]
INFO:root:Loading pretrained weights: ['imagebind_lmm_projector.mlps.0.0.weight', 'imagebind_lmm_projector.mlps.0.0.bias', 'imagebind_lmm_projector.mlps.0.2.weight', 'imagebind_lmm_projector.mlps.0.2.bias', 'imagebind_lmm_projector.mlps.1.0.weight', 'imagebind_lmm_projector.mlps.1.0.bias', 'imagebind_lmm_projector.mlps.1.2.weight', 'imagebind_lmm_projector.mlps.1.2.bias', 'imagebind_lmm_projector.mlps.2.0.weight', 'imagebind_lmm_projector.mlps.2.0.bias', 'imagebind_lmm_projector.mlps.2.2.weight', 'imagebind_lmm_projector.mlps.2.2.bias', 'imagebind_lmm_projector.mlps.3.0.weight', 'imagebind_lmm_projector.mlps.3.0.bias', 'imagebind_lmm_projector.mlps.3.2.weight', 'imagebind_lmm_projector.mlps.3.2.bias']
INFO:root:Loading and merging LoRA weights from sshh12/Mistral-7B-LoRA-ImageBind-LLAVA
adapter_config.json: 100% 534/534 [00:00<00:00, 2.71MB/s]
WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
adapter_model.bin: 3% 10.5M/336M [00:00<00:10, 31.9MB/s]^C
2024-06-17 15:33:13.443686: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-17 15:33:13.443752: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-17 15:33:13.599187: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-17 15:33:13.898052: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-17 15:33:17.194957: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
tokenizer_config.json: 100% 1.47k/1.47k [00:00<00:00, 8.37MB/s]
tokenizer.model: 100% 493k/493k [00:00<00:00, 5.81MB/s]
special_tokens_map.json: 100% 72.0/72.0 [00:00<00:00, 424kB/s]
tokenizer.json: 100% 1.80M/1.80M [00:00<00:00, 15.9MB/s]
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
config.json: 100% 741/741 [00:00<00:00, 3.85MB/s]
/usr/local/lib/python3.10/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead.
warnings.warn(
Downloading imagebind weights to .checkpoints/imagebind_huge.pth ...
100% 4.47G/4.47G [01:15<00:00, 63.3MB/s]
INFO:root:Loading base model from mistralai/Mistral-7B-Instruct-v0.1 as 16 bits
model.safetensors.index.json: 100% 25.1k/25.1k [00:00<00:00, 47.2MB/s]
Downloading shards: 0% 0/2 [00:00<?, ?it/s]
model-00001-of-00002.safetensors: 0% 0.00/9.94G [00:00<?, ?B/s]
model-00001-of-00002.safetensors: 0% 10.5M/9.94G [00:00<01:41, 97.8MB/s]
model-00001-of-00002.safetensors: 0% 21.0M/9.94G [00:00<01:42, 96.5MB/s]
model-00001-of-00002.safetensors: 0% 31.5M/9.94G [00:00<01:55, 85.8MB/s]
model-00001-of-00002.safetensors: 0% 41.9M/9.94G [00:00<01:56, 85.2MB/s]
model-00001-of-00002.safetensors: 1% 52.4M/9.94G [00:00<01:55, 85.8MB/s]
model-00001-of-00002.safetensors: 1% 62.9M/9.94G [00:00<01:54, 86.2MB/s]
...........
model-00001-of-00002.safetensors: 99% 9.89G/9.94G [03:39<00:03, 15.0MB/s]
model-00001-of-00002.safetensors: 100% 9.90G/9.94G [03:40<00:02, 15.6MB/s]
model-00001-of-00002.safetensors: 100% 9.92G/9.94G [03:40<00:01, 22.3MB/s]
model-00001-of-00002.safetensors: 100% 9.94G/9.94G [03:40<00:00, 45.1MB/s]
Downloading shards: 50% 1/2 [03:40<03:40, 220.64s/it]
model-00002-of-00002.safetensors: 0% 0.00/4.54G [00:00<?, ?B/s]
model-00002-of-00002.safetensors: 0% 10.5M/4.54G [00:00<00:44, 103MB/s]
model-00002-of-00002.safetensors: 1% 31.5M/4.54G [00:00<00:42, 106MB/s]
model-00002-of-00002.safetensors: 1% 52.4M/4.54G [00:00<00:41, 108MB/s]
model-00002-of-00002.safetensors: 2% 73.4M/4.54G [00:00<00:38, 117MB/s]
model-00002-of-00002.safetensors: 2% 94.4M/4.54G [00:00<00:37, 120MB/s]
model-00002-of-00002.safetensors: 3% 115M/4.54G [00:01<00:38, 116MB/s]
...........
model-00002-of-00002.safetensors: 98% 4.46G/4.54G [01:29<00:03, 21.6MB/s]
model-00002-of-00002.safetensors: 99% 4.48G/4.54G [01:29<00:02, 30.5MB/s]
model-00002-of-00002.safetensors: 99% 4.50G/4.54G [01:30<00:01, 40.7MB/s]
model-00002-of-00002.safetensors: 100% 4.52G/4.54G [01:30<00:00, 50.9MB/s]
model-00002-of-00002.safetensors: 100% 4.54G/4.54G [01:30<00:00, 50.2MB/s]
Downloading shards: 100% 2/2 [05:11<00:00, 155.64s/it]
Loading checkpoint shards: 100% 2/2 [01:03<00:00, 31.83s/it]
generation_config.json: 100% 116/116 [00:00<00:00, 668kB/s]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the disk and cpu.
INFO:root:Loading projector weights for ['imagebind']
non_lora_trainables.bin: 100% 168M/168M [00:14<00:00, 11.5MB/s]
INFO:root:Loading pretrained weights: ['imagebind_lmm_projector.mlps.0.0.weight', 'imagebind_lmm_projector.mlps.0.0.bias', 'imagebind_lmm_projector.mlps.0.2.weight', 'imagebind_lmm_projector.mlps.0.2.bias', 'imagebind_lmm_projector.mlps.1.0.weight', 'imagebind_lmm_projector.mlps.1.0.bias', 'imagebind_lmm_projector.mlps.1.2.weight', 'imagebind_lmm_projector.mlps.1.2.bias', 'imagebind_lmm_projector.mlps.2.0.weight', 'imagebind_lmm_projector.mlps.2.0.bias', 'imagebind_lmm_projector.mlps.2.2.weight', 'imagebind_lmm_projector.mlps.2.2.bias', 'imagebind_lmm_projector.mlps.3.0.weight', 'imagebind_lmm_projector.mlps.3.0.bias', 'imagebind_lmm_projector.mlps.3.2.weight', 'imagebind_lmm_projector.mlps.3.2.bias']
INFO:root:Loading and merging LoRA weights from sshh12/Mistral-7B-LoRA-ImageBind-LLAVA
adapter_config.json: 100% 534/534 [00:00<00:00, 2.71MB/s]
WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
adapter_model.bin: 3% 10.5M/336M [00:00<00:10, 31.9MB/s]^C
So, i was trying to run this in google colab:
And then i got this:
Here is the full log:
@sshh12