Repository files navigation Multimodal Large Language Models (MLLM)
Rules & FAQ
Class recordings (in Russian)
#
Date
Title
Materials
1
Feb 11
Word Embeddings and Classification & Language Modelling
slides
Embeddings & CNN/LSTM LMs with PyTorch
notebook
2
Feb 18
Seq2seq, Attention, and Transformers
slides
Transformer from Scratch
notebook
3
Feb 25
Pretraining, SFT, RLHF & PEFT, LoRA
slides
Parameter-efficient fine-tuning
notebook
4
Mar 4
Reasoning, RLVF & RAG
slides
Tokenization
notebook
5
Mar 11
Efficient Inference: FlashAttention, KV cache, Distillation, Quantization
6
Mar 18
Introduction to MLLMs and Image Modality
slides
Classification of VLMs: Deep Fusion vs Early Fusion
notebook
7
Mar 25
VLLM and Data Generation
slides
Visual Autoregressive Transformer
notebook
8
Apr 1
Video Understanding
slides
Video Modality and Any-to-any Models
notebook
9
Apr 8
Action Modality (Robotics)
Intro to Vision Language Action Models
notebook
10
Apr 15
Multimodal Agents
11
Apr 22
3D Data Modality
12
Apr 29
Guest Lecture
About
Multimodal LLMs course
Resources
Stars
Watchers
Forks
You can’t perform that action at this time.