Skip to content

An implementation of the full LLM pipeline from scratch (Transformers, Tokenizer, Pretraining, etc.).

Notifications You must be signed in to change notification settings

LounesMD/vSmallLanguageModel

Repository files navigation

very Small Language Model (vSLM)

About

A short project to implement a small GPT like model from scratch. Most of it is inspired by minBPE and nanoGPT.

How to run the project:

  1. Create and activate a virtual environment (optional)
python -m venv project_venv
source project_venv/bin/activate
  1. Setup the project and download the requirements
pip install -e .
pip install -r requirements.txt
  1. Run the code
python main.py --training_iterations=5000 --text=shakespeare --train_model=False --task=generation
  1. In coming:

    • Fine-tuning (LoRA + RLHF)
    • (TODO) Provide good training parameters fo translation
  2. Done:

    • Tokenizer (byte and character level)
    • Full Transformer architecture (Encoder + Decoder)
    • Training and inference pipeline for generation (Lorem Ipsum and Shakespeare)
    • Training and inference pipeline for translation (en → fr)

About

An implementation of the full LLM pipeline from scratch (Transformers, Tokenizer, Pretraining, etc.).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages