MIT 6.5940 - Notes and Labs⚡️

Notes and hands-on notebooks from MIT 6.5940, (Fall 2023) : TinyML and Efficient Deep Learning Computing lecture.

🌟 Course Overview

This course introduces efficient deep learning computing techniques that enable powerful deep learning applications on resource-constrained devices. The main focus is on achieving maximal performance with minimal resource consumption.

🎯 Key Learning Objectives

Upon completion of this course, you will be able to:

Shrink and Accelerate Models: Master techniques like Pruning, Quantization (INT8/INT4), and Knowledge Distillation to dramatically reduce model size and inference latency.
Design Efficient Architectures: Utilize Neural Architecture Search (NAS), specifically Once-for-All (OFA), to automatically design hardware-aware networks.
Master LLM Efficiency: Apply Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA and QLoRA for efficient adaptation of multi-billion parameter models.
Optimize Distributed Systems: Implement Data, Pipeline, and Tensor Parallelism for efficient training of models that exceed single-GPU memory.
Deploy to the Edge (TinyML): Design models and system software (MCUNet, TinyEngine) capable of running complex AI on microcontrollers with Kilobytes of RAM.
Explore Future Computing: Understand the fundamentals of Quantum Machine Learning (QML) and implement Noise Mitigation techniques for current NISQ hardware.

💻 Tech Stack & Prerequisites

Programming: Strong proficiency in Python 3.
Frameworks: Experience with PyTorch (primary framework) or TensorFlow.
Math: Comfort with Linear Algebra, Calculus, and Probability.
Prerequisites: Familiarity with standard deep learning concepts (CNNs, RNNs, basic optimizers).

📚 Course Outiline (Updated from Fall 2024)

Chapter 0: Introduction

Lecture	Topic (notes)	Slide	Notebook	Reference
L1	Introduction	Slides	—	Video
L2	Basics of Deep Learning	Slides	L02_NN_Basics.ipynb	Video

Chapter I: Efficient Inference

Lecture	Topic (notes)	Slide	Notebook	Reference
L3	Pruning and Sparsity (Part I)	Slides	—	Video
L4	Pruning and Sparsity (Part II)	Slides	L03_L04_Pruning.ipynb	Video
L5	Quantization (Part I)	Slides	—	Video
L6	Quantization (Part II)	Slides	L08_Quantization_PTQ.ipynb	Video
L7	Neural Architecture Search (Part I)	Slides	Lab3_NAS.ipynb	Video
L8	Neural Architecture Search (Part II)	Slides	—	Video
L9	Knowledge Distillation	Slides	L09_Quantization_QAT.ipynb	Video
L10	MCUNet: TinyML on Microcontrollers	Slides	L10_L11_NAS.ipynb	Video
L11	TinyEngine and Parallel Processing	Slides	L21_TinyML_Deployment.ipynb	Video

Chapter II: Domain-Specific Optimization

Lecture	Topic (notes)	Slide	Notebook	Reference
L12	Transformer and LLM	Slides	—	Video
L13	Efficient LLM Deployment	Slides	Lab4_LLM_Quantization.ipynb	Video
L14	LLM Post Training	Slides	—	Video
L15	Long Context LLM	Slides	—	Video
L16	Vision Transformer	Slides	L16_LLM_QLoRA_Finetuning.ipynb	Video
L17	GAN, Video, and Point Cloud	Slides	—	Video
L18	Diffusion Model	Slides	—	Video
LA1 *	Audio Transformers and Efficient Speech Recognition	-	-	wav2vec 2.0, Whisper, Conformer
LA2 *	Efficient Speech Synthesis and Audio Generation	-	-	WaveNet, FastSpeech, VALL-E, AudioLDM
LA3 *	Audio-Language Models and Sound Understanding	-	-	CLAP, SALMONN, Qwen-Audio

Chapter III: Efficient Training

Lecture	Topic (notes)	Slide	Notebook	Reference
L19	Distributed Training (Part I)	Slides	Lab5_LLM_at_Edge.ipynb	Video
L20	Distributed Training (Part II)	Slides	—	Video
L21	On-Device Training and Transfer Learning	Slides	—	Video

Chapter IV: Advanced Topics

Lecture	Topic (notes)	Slide	Notebook	Reference
L22	Course Summary + Quantum ML I	Slides	—	Video
L23	Quantum Machine Learning II	Slides	L23_QML_Noise_Mitigation.ipynb	Video
L24	Final Project Presentation	Slides	—	Video
L25	Final Project Presentation	Slides	—	Video
L26	Final Project Presentation	Slides	—	Video

🔊 LA1–LA3: Community Research Extension for Audio Domain

Note

Note: The lectures prefixed LA (LA1–LA3) are not part of the official MIT 6.5940 curriculum. They are community-designed research notes that I created to extend the course's domain-specific coverage to the audio modality — an area largely absent from the original syllabus despite its critical importance for Edge AI.

Motivation

The official course covers efficient techniques for Language (L12–L15) and Vision (L16–L18), but the audio/sound modality was left out. As an Edge AI Engineer, I found this gap significant for several reasons:

Audio is the dominant edge modality. Billions of devices (earbuds, smart speakers, hearing aids, vehicles, MCUs) rely on real-time audio processing — often under tighter latency, memory, and power constraints than vision or text.
The same efficiency playbook applies. Every core technique taught in Chapters I–III (pruning, quantization, NAS, knowledge distillation, distributed training) transfers directly to audio models — yet the domain has unique challenges (long sequences, streaming requirements, sample-rate bottlenecks) that deserve dedicated coverage.
Rapid convergence with LLMs. Audio-Language Models (ALMs) are following the exact same trajectory as Vision-Language Models (VLMs): frozen encoder → bridge module → LLM backbone. Understanding this pattern across all three modalities gives a complete picture of efficient multimodal AI at the edge.

Structure

These notes follow Professor Song Han's pedagogical framework: start from the breakthrough model that defined the domain, scale it up, then systematically compress it back down for deployment using the course's core techniques.

Audio Lecture	Parallel to	Breakthrough Paper
LA1 — Audio Transformers & ASR	L12–L13 (Transformer → LLM Deployment)	wav2vec 2.0 (Baevski et al., 2020)
LA2 — Speech Synthesis & Audio Generation	L17–L18 (GAN → Diffusion)	WaveNet (van den Oord et al., 2016)
LA3 — Audio-Language Models & Sound Understanding	L12 Multimodal + L16 (ViT)	CLAP (Elizalde et al., 2023)

💻 Hands-on Labs & Advanced Project Ideas (MIT 6.5940 Final Projects)

All lab exercises are designed to provide hands-on experience with real-world frameworks:

LLM Deployment: Hands-on experience deploying and running QLoRA-tuned LLMs (e.g., Llama-2) directly on a local GPU or CPU.
TinyML: Utilizing the TinyEngine and TensorFlow Lite Micro frameworks for model deployment on simulated microcontroller environments.
QML: Using Qiskit and Pennylane to build, train, and mitigate noise in variational quantum circuits.

For further advanced projects the course provided a set of state-of-the-art research challenges in efficient ML to explore.

1. Project: TSM for Efficient Video Understanding (Temporal Shift Module)

Goal: Address the challenge of efficient video analysis by leveraging Temporal Shift Module (TSM), which captures temporal relationships without adding computational cost[cite: 8, 10].
Description: TSM works by shifting part of the channels along the temporal dimension, facilitating information exchange among neighboring frames[cite: 9]. Projects could involve changing the backbone (e.g., from MobileNetV2) or applying TSM to a new video task like fall detection[cite: 14, 15].

2. Project: SIGE - Sparse Engine for Generative AI

Goal: Accelerate image editing in deep generative models by avoiding the re-synthesis of unedited regions[cite: 34, 35].
Description: SIGE (Sparse Inference GEnerator) is a sparse engine that caches and reuses feature maps from the original image to generate only the edited regions[cite: 36]. The project focuses on integrating SIGE with Stable Diffusion XL (SDXL) to assess and potentially achieve more significant speed improvements[cite: 37, 39].

3. Project: QServe for Online Quantized LLM Serving

Goal: Achieve high-throughput, real-time serving of low-precision quantized LLMs (like INT4) in cloud-based settings[cite: 165, 182].
Description: The project centers on implementing an online, real-time serving system using the QServe library, which utilizes the QoQ (W4A8KV4) quantization algorithm. The final objective is to build an online Gradio demo to serve these highly-efficient, quantized LLMs[cite: 168, 183, 185].

Full documentation and project details can be found here.

References

Course Youtube Playlist: EfficientML.ai Course | 2023 Fall | MIT 6.5940
Course Youtube Playlist (NEW): EfficientML.ai Course | 2024 Fall | MIT 6.5940
Course Prerequisites: pdf
All slides available: here
Final project list (2023- 2024): EfficientML.ai Project Ideas (Fall 2024)

⚠️ Disclaimer

Important

The Audio extension notes (LA1–LA3) are not official course material — they are community research notes that I designed by applying Professor Han's pedagogical framework and the course's core efficiency principles to the audio domain. All credit for the teaching methodology, structure, and foundational techniques goes to him and the MIT HAN Lab team.

🙏 Acknowledgements

Special thanks to:

Professor Song Han (MIT/HAN Lab) for his tremendous effort and passion in developing the EfficientML.ai framework and ecosystem, and for making them accessible to everyone.
Yifan Lu for sharing all homework labs.

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
chapters		chapters
lab		lab
resources		resources
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIT 6.5940 - Notes and Labs⚡️

🌟 Course Overview

🎯 Key Learning Objectives

💻 Tech Stack & Prerequisites

📚 Course Outiline (Updated from Fall 2024)

Chapter 0: Introduction

Chapter I: Efficient Inference

Chapter II: Domain-Specific Optimization

Chapter III: Efficient Training

Chapter IV: Advanced Topics

🔊 LA1–LA3: Community Research Extension for Audio Domain

Motivation

Structure

💻 Hands-on Labs & Advanced Project Ideas (MIT 6.5940 Final Projects)

1. Project: TSM for Efficient Video Understanding (Temporal Shift Module)

2. Project: SIGE - Sparse Engine for Generative AI

3. Project: QServe for Online Quantized LLM Serving

References

⚠️ Disclaimer

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MIT 6.5940 - Notes and Labs⚡️

🌟 Course Overview

🎯 Key Learning Objectives

💻 Tech Stack & Prerequisites

📚 Course Outiline (Updated from Fall 2024)

Chapter 0: Introduction

Chapter I: Efficient Inference

Chapter II: Domain-Specific Optimization

Chapter III: Efficient Training

Chapter IV: Advanced Topics

🔊 LA1–LA3: Community Research Extension for Audio Domain

Motivation

Structure

💻 Hands-on Labs & Advanced Project Ideas (MIT 6.5940 Final Projects)

1. Project: TSM for Efficient Video Understanding (Temporal Shift Module)

2. Project: SIGE - Sparse Engine for Generative AI

3. Project: QServe for Online Quantized LLM Serving

References

⚠️ Disclaimer

🙏 Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages