A ready-made environment is provided with Docker. So, on Linux or WSL(v2), make sure you have installed Docker with Docker Compose.
Commands for installation and environment management have been set up in the Makefile.
Install environment (GPU support included):
make installTo launch (and start a notebook server in the background) you can run (CPU only):
make upOr:
make up-gpuOnce it's launched, retrieve the notebook link by running make logs (can take a few seconds to appear in the container log).
A .devcontainer is provided, which should allow you to properly develop with full IDE support form inside the container in VSCode. You can enable this by installing VSCode Remote Containers extension and choosing "open project in container".
GGML (post-training quantization and inference, CPU-focused)
GPTQ (post-training quantization, GPU)
Fine Tuning Language Models with Just Forward Passes (code)
Understanding 4bit Quantization: QLoRA explained (w/ Colab)
PEFT LoRA Explained in Detail - Fine-Tune your LLM on your local GPU
Boost Fine-Tuning Performance of LLM: Optimal Architecture w/ PEFT LoRA Adapter-Tuning on Your GPU
Anyscale Blog: Fine-tuning Llama 2
Medium: Easily Finetune Llama 2 for Your Text-to-SQL Applications
GitHub: Modal Finetune SQL Tutorial
Medium: Easily Finetune Llama 2 for Your Text-to-SQL Applications
GitHub: Ray Project - Finetuning LLMS with Deepspeed
Codehammer: How to Load Llama 13B for Inference on a Single RTX 4090
Storm in the Castle: Alpaca 13B
SpQR - Sparse-Quantized Representation (to be integrated in BitsAndBytes)
Fine Tuning Language Models with Just Forward Passes
You will need to export an environment variable: HUGGINGFACE_KEY=<your_key> and then you can use the build, run and lint commands defined in the makefile e.g. make build. The docker container uses multiple stages to cache models from huggingface such that they aren't downloaded for every code change. To add more models to the cache simply add a line to the ModelPaths class in the model_paths.py module.