Lednik is a library for distilling large transformer models into lightweight, efficient representations—either Static Embeddings or small Lednik Transformers. It provides tools for model initialization, knowledge distillation training, and downstream classifier training.
- Static Embeddings Distillation: Compress a transformer's vocabulary into a static embedding matrix using PCA and Smooth Inverse Frequency (SIF) weighting.
- Lednik Transformer Distillation: Initialize a small specific transformer model ("Lednik") from a teacher model and distill knowledge into it.
- Training Pipelines: Ready-to-use ClearML + PyTorch Lightning pipelines for distillation and classification.
- Efficient Inference: Optimized model classes for fast CPU/GPU inference.
lednik/
├── distill/ # Distillation logic, factory methods, and training modules
├── models/ # Model definitions (LednikModel, StaticEmbeddingsModel)
├── serving/ # FastAPI serving utilities
pipelines/
├── finetuning/ # Knowledge Distillation pipeline
└── classifier_training/# Downstream classification pipeline
configs/ # Configuration files (YAML) for training
To start, you need to initialize a student model (Static or Lednik) from a pretrained Teacher transformer using the model_factory.
Extracts hidden states, applies PCA, and creates a StaticEmbeddingsModel.
from transformers import AutoModel, AutoTokenizer
from lednik.distill.model_factory import create_static_embeddings_model
teacher = AutoModel.from_pretrained("deepvk/USER-bge-m3")
tokenizer = AutoTokenizer.from_pretrained("deepvk/USER-bge-m3")
static_model = create_static_embeddings_model(
model=teacher,
tokenizer=tokenizer,
embedding_dim=300, # Target dimension
pooling="mean", # Pooling strategy: "mean", "cls", "last"
embedding_extraction_batch_size=256,
device="cuda"
)
static_model.save_pretrained("saved_models/static_base")Initializes a small transformer (LednikModel) compatible with the teacher's tokenizer.
from lednik.models import LednikConfig
from lednik.distill.model_factory import create_lednik_transformer
config = LednikConfig(hidden_size=384, num_hidden_layers=2, ...)
lednik_model = create_lednik_transformer(
model=teacher,
tokenizer=tokenizer,
model_config=config,
pooling="cls",
embedding_extraction_batch_size=256
)
lednik_model.save_pretrained("saved_models/lednik_base")The distillation process aligns the student model's embeddings with the teacher's on specific datasets. This uses the pipelines/finetuning/run.py script.
The training pipeline uses hardcoded paths to configuration files in the configs/finetuning/ directory. You should edit these files before running the script:
-
Distillation Config (
configs/finetuning/distill_config.yaml): Defines hyperparameters like Learning Rate, Optimizer, and Loss weights.optimizer: type: AdamW8bit betas: [0.9, 0.98] lr: scheduler_type: plateau-with-cosine-annealing base_value: 9e-5 distillation_method: type: direct-distillation loss_type: cosine contrastive_loss_weight: 0.8 # Weight for contrastive loss temperature: 0.07 # Temp for contrastive loss batch_size: 96 num_workers: 4
-
Training Settings (
configs/finetuning/training_settings.yaml): Defines the environment, data paths, and trainer settings.data: train_datasets: Dataset1: <CLEARML_ID> trainer: accelerator: "cuda" devices: [0] strategy: "fsdp1" max_epochs: 2 # ClearML IDs for artifacts teacher_model_id: <TEACHER_ID> student_model_id: <STUDENT_ID> # The model created in step 1 tokenizer_id: <TOKENIZER_ID>
Run the pipeline module directly. The script automatically picks up the configuration files from the configs/ directory.
python -m pipelines.finetuning.runOptional arguments:
--remote-execution-queue <QUEUE_NAME>: Run the task remotely on a ClearML queue.--tags tag1,tag2: Add tags to the ClearML task.
Once you have a distilled model, you can fine-tune it for a specific downstream classification task using pipelines/classifier_training/run.py.
Edit the configuration files in configs/classification/:
-
Training Config (
configs/classification/train_config.yaml):label2id: negative: 0 neutral: 1 positive: 2 lr: base_value: 7e-4 class_weights: [0.55, 0.24, 0.21] # Optional batch_size: 128 num_workers: 8
-
Training Settings (
configs/classification/training_settings.yaml):data: datasets: MyDataset: <CLEARML_ID> label_column: label input_column: text model_id: <DISTILLED_MODEL_ID> # Result from step 2 trainer: accelerator: "cuda" max_epochs: 10
python -m pipelines.classifier_training.runOptional arguments:
--remote-execution-queue <QUEUE_NAME>: Run the task remotely on a ClearML queue.--tags tag1,tag2: Add tags to the ClearML task.
A lightweight transformer model designed to be initialized from a larger parent model. It supports features like Rotary Positional Embeddings (RoPE) and Flash Attention.
A simple model that maps tokens directly to static vectors, optionally handling subword pooling and normalization. Ideal for extremely low-latency scenarios where full transformer context is not strictly required.