Notes and hands-on notebooks from MIT 6.5940, (Fall 2023) : TinyML and Efficient Deep Learning Computing lecture.
This course introduces efficient deep learning computing techniques that enable powerful deep learning applications on resource-constrained devices. The main focus is on achieving maximal performance with minimal resource consumption.
Upon completion of this course, you will be able to:
- Shrink and Accelerate Models: Master techniques like Pruning, Quantization (INT8/INT4), and Knowledge Distillation to dramatically reduce model size and inference latency.
- Design Efficient Architectures: Utilize Neural Architecture Search (NAS), specifically Once-for-All (OFA), to automatically design hardware-aware networks.
- Master LLM Efficiency: Apply Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA and QLoRA for efficient adaptation of multi-billion parameter models.
- Optimize Distributed Systems: Implement Data, Pipeline, and Tensor Parallelism for efficient training of models that exceed single-GPU memory.
- Deploy to the Edge (TinyML): Design models and system software (MCUNet, TinyEngine) capable of running complex AI on microcontrollers with Kilobytes of RAM.
- Explore Future Computing: Understand the fundamentals of Quantum Machine Learning (QML) and implement Noise Mitigation techniques for current NISQ hardware.
- Programming: Strong proficiency in Python 3.
- Frameworks: Experience with PyTorch (primary framework) or TensorFlow.
- Math: Comfort with Linear Algebra, Calculus, and Probability.
- Prerequisites: Familiarity with standard deep learning concepts (CNNs, RNNs, basic optimizers).
π Course Outiline (Updated from Fall 2024)
| Lecture | Topic (notes) | Slide | Notebook | Reference |
|---|---|---|---|---|
| L1 | Introduction | Slides | β | Video |
| L2 | Basics of Deep Learning | Slides | L02_NN_Basics.ipynb | Video |
| Lecture | Topic (notes) | Slide | Notebook | Reference |
|---|---|---|---|---|
| L12 | Transformer and LLM | Slides | β | Video |
| L13 | Efficient LLM Deployment | Slides | Lab4_LLM_Quantization.ipynb | Video |
| L14 | LLM Post Training | Slides | β | Video |
| L15 | Long Context LLM | Slides | β | Video |
| L16 | Vision Transformer | Slides | L16_LLM_QLoRA_Finetuning.ipynb | Video |
| L17 | GAN, Video, and Point Cloud | Slides | β | Video |
| L18 | Diffusion Model | Slides | β | Video |
| LA1 * | Audio Transformers and Efficient Speech Recognition | - | - | wav2vec 2.0, Whisper, Conformer |
| LA2 * | Efficient Speech Synthesis and Audio Generation | - | - | WaveNet, FastSpeech, VALL-E, AudioLDM |
| LA3 * | Audio-Language Models and Sound Understanding | - | - | CLAP, SALMONN, Qwen-Audio |
| Lecture | Topic (notes) | Slide | Notebook | Reference |
|---|---|---|---|---|
| L19 | Distributed Training (Part I) | Slides | Lab5_LLM_at_Edge.ipynb | Video |
| L20 | Distributed Training (Part II) | Slides | β | Video |
| L21 | On-Device Training and Transfer Learning | Slides | β | Video |
| Lecture | Topic (notes) | Slide | Notebook | Reference |
|---|---|---|---|---|
| L22 | Course Summary + Quantum ML I | Slides | β | Video |
| L23 | Quantum Machine Learning II | Slides | L23_QML_Noise_Mitigation.ipynb | Video |
| L24 | Final Project Presentation | Slides | β | Video |
| L25 | Final Project Presentation | Slides | β | Video |
| L26 | Final Project Presentation | Slides | β | Video |
Note
Note: The lectures prefixed LA (LA1βLA3) are not part of the official MIT 6.5940 curriculum. They are community-designed research notes that I created to extend the course's domain-specific coverage to the audio modality β an area largely absent from the original syllabus despite its critical importance for Edge AI.
The official course covers efficient techniques for Language (L12βL15) and Vision (L16βL18), but the audio/sound modality was left out. As an Edge AI Engineer, I found this gap significant for several reasons:
- Audio is the dominant edge modality. Billions of devices (earbuds, smart speakers, hearing aids, vehicles, MCUs) rely on real-time audio processing β often under tighter latency, memory, and power constraints than vision or text.
- The same efficiency playbook applies. Every core technique taught in Chapters IβIII (pruning, quantization, NAS, knowledge distillation, distributed training) transfers directly to audio models β yet the domain has unique challenges (long sequences, streaming requirements, sample-rate bottlenecks) that deserve dedicated coverage.
- Rapid convergence with LLMs. Audio-Language Models (ALMs) are following the exact same trajectory as Vision-Language Models (VLMs):
frozen encoder β bridge module β LLM backbone. Understanding this pattern across all three modalities gives a complete picture of efficient multimodal AI at the edge.
These notes follow Professor Song Han's pedagogical framework: start from the breakthrough model that defined the domain, scale it up, then systematically compress it back down for deployment using the course's core techniques.
| Audio Lecture | Parallel to | Breakthrough Paper |
|---|---|---|
| LA1 β Audio Transformers & ASR | L12βL13 (Transformer β LLM Deployment) | wav2vec 2.0 (Baevski et al., 2020) |
| LA2 β Speech Synthesis & Audio Generation | L17βL18 (GAN β Diffusion) | WaveNet (van den Oord et al., 2016) |
| LA3 β Audio-Language Models & Sound Understanding | L12 Multimodal + L16 (ViT) | CLAP (Elizalde et al., 2023) |
All lab exercises are designed to provide hands-on experience with real-world frameworks:
- LLM Deployment: Hands-on experience deploying and running QLoRA-tuned LLMs (e.g., Llama-2) directly on a local GPU or CPU.
- TinyML: Utilizing the TinyEngine and TensorFlow Lite Micro frameworks for model deployment on simulated microcontroller environments.
- QML: Using Qiskit and Pennylane to build, train, and mitigate noise in variational quantum circuits.
For further advanced projects the course provided a set of state-of-the-art research challenges in efficient ML to explore.
- Goal: Address the challenge of efficient video analysis by leveraging Temporal Shift Module (TSM), which captures temporal relationships without adding computational cost[cite: 8, 10].
- Description: TSM works by shifting part of the channels along the temporal dimension, facilitating information exchange among neighboring frames[cite: 9]. Projects could involve changing the backbone (e.g., from MobileNetV2) or applying TSM to a new video task like fall detection[cite: 14, 15].
- Goal: Accelerate image editing in deep generative models by avoiding the re-synthesis of unedited regions[cite: 34, 35].
- Description: SIGE (Sparse Inference GEnerator) is a sparse engine that caches and reuses feature maps from the original image to generate only the edited regions[cite: 36]. The project focuses on integrating SIGE with Stable Diffusion XL (SDXL) to assess and potentially achieve more significant speed improvements[cite: 37, 39].
- Goal: Achieve high-throughput, real-time serving of low-precision quantized LLMs (like INT4) in cloud-based settings[cite: 165, 182].
- Description: The project centers on implementing an online, real-time serving system using the QServe library, which utilizes the QoQ (W4A8KV4) quantization algorithm. The final objective is to build an online Gradio demo to serve these highly-efficient, quantized LLMs[cite: 168, 183, 185].
Full documentation and project details can be found here.
- Course Youtube Playlist: EfficientML.ai Course | 2023 Fall | MIT 6.5940
- Course Youtube Playlist (NEW): EfficientML.ai Course | 2024 Fall | MIT 6.5940
- Course Prerequisites: pdf
- All slides available: here
- Final project list (2023- 2024): EfficientML.ai Project Ideas (Fall 2024)
Important
The Audio extension notes (LA1βLA3) are not official course material β they are community research notes that I designed by applying Professor Han's pedagogical framework and the course's core efficiency principles to the audio domain. All credit for the teaching methodology, structure, and foundational techniques goes to him and the MIT HAN Lab team.
Special thanks to:
- Professor Song Han (MIT/HAN Lab) for his tremendous effort and passion in developing the EfficientML.ai framework and ecosystem, and for making them accessible to everyone.
- Yifan Lu for sharing all homework labs.