Skip to content

afondiel/MIT-6.5940-Notes-Labs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

116 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MIT 6.5940 - Notes and Labs⚑️

Notes and hands-on notebooks from MIT 6.5940, (Fall 2023) : TinyML and Efficient Deep Learning Computing lecture.

🌟 Course Overview

This course introduces efficient deep learning computing techniques that enable powerful deep learning applications on resource-constrained devices. The main focus is on achieving maximal performance with minimal resource consumption.

🎯 Key Learning Objectives

Upon completion of this course, you will be able to:

  • Shrink and Accelerate Models: Master techniques like Pruning, Quantization (INT8/INT4), and Knowledge Distillation to dramatically reduce model size and inference latency.
  • Design Efficient Architectures: Utilize Neural Architecture Search (NAS), specifically Once-for-All (OFA), to automatically design hardware-aware networks.
  • Master LLM Efficiency: Apply Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA and QLoRA for efficient adaptation of multi-billion parameter models.
  • Optimize Distributed Systems: Implement Data, Pipeline, and Tensor Parallelism for efficient training of models that exceed single-GPU memory.
  • Deploy to the Edge (TinyML): Design models and system software (MCUNet, TinyEngine) capable of running complex AI on microcontrollers with Kilobytes of RAM.
  • Explore Future Computing: Understand the fundamentals of Quantum Machine Learning (QML) and implement Noise Mitigation techniques for current NISQ hardware.

πŸ’» Tech Stack & Prerequisites

  • Programming: Strong proficiency in Python 3.
  • Frameworks: Experience with PyTorch (primary framework) or TensorFlow.
  • Math: Comfort with Linear Algebra, Calculus, and Probability.
  • Prerequisites: Familiarity with standard deep learning concepts (CNNs, RNNs, basic optimizers).

πŸ“š Course Outiline (Updated from Fall 2024)

Chapter 0: Introduction

Lecture Topic (notes) Slide Notebook Reference
L1 Introduction Slides β€” Video
L2 Basics of Deep Learning Slides L02_NN_Basics.ipynb Video

Chapter I: Efficient Inference

Lecture Topic (notes) Slide Notebook Reference
L3 Pruning and Sparsity (Part I) Slides β€” Video
L4 Pruning and Sparsity (Part II) Slides L03_L04_Pruning.ipynb Video
L5 Quantization (Part I) Slides β€” Video
L6 Quantization (Part II) Slides L08_Quantization_PTQ.ipynb Video
L7 Neural Architecture Search (Part I) Slides Lab3_NAS.ipynb Video
L8 Neural Architecture Search (Part II) Slides β€” Video
L9 Knowledge Distillation Slides L09_Quantization_QAT.ipynb Video
L10 MCUNet: TinyML on Microcontrollers Slides L10_L11_NAS.ipynb Video
L11 TinyEngine and Parallel Processing Slides L21_TinyML_Deployment.ipynb Video

Chapter II: Domain-Specific Optimization

Lecture Topic (notes) Slide Notebook Reference
L12 Transformer and LLM Slides β€” Video
L13 Efficient LLM Deployment Slides Lab4_LLM_Quantization.ipynb Video
L14 LLM Post Training Slides β€” Video
L15 Long Context LLM Slides β€” Video
L16 Vision Transformer Slides L16_LLM_QLoRA_Finetuning.ipynb Video
L17 GAN, Video, and Point Cloud Slides β€” Video
L18 Diffusion Model Slides β€” Video
LA1 * Audio Transformers and Efficient Speech Recognition - - wav2vec 2.0, Whisper, Conformer
LA2 * Efficient Speech Synthesis and Audio Generation - - WaveNet, FastSpeech, VALL-E, AudioLDM
LA3 * Audio-Language Models and Sound Understanding - - CLAP, SALMONN, Qwen-Audio

Chapter III: Efficient Training

Lecture Topic (notes) Slide Notebook Reference
L19 Distributed Training (Part I) Slides Lab5_LLM_at_Edge.ipynb Video
L20 Distributed Training (Part II) Slides β€” Video
L21 On-Device Training and Transfer Learning Slides β€” Video

Chapter IV: Advanced Topics

Lecture Topic (notes) Slide Notebook Reference
L22 Course Summary + Quantum ML I Slides β€” Video
L23 Quantum Machine Learning II Slides L23_QML_Noise_Mitigation.ipynb Video
L24 Final Project Presentation Slides β€” Video
L25 Final Project Presentation Slides β€” Video
L26 Final Project Presentation Slides β€” Video

πŸ”Š LA1–LA3: Community Research Extension for Audio Domain

Note

Note: The lectures prefixed LA (LA1–LA3) are not part of the official MIT 6.5940 curriculum. They are community-designed research notes that I created to extend the course's domain-specific coverage to the audio modality β€” an area largely absent from the original syllabus despite its critical importance for Edge AI.

Motivation

The official course covers efficient techniques for Language (L12–L15) and Vision (L16–L18), but the audio/sound modality was left out. As an Edge AI Engineer, I found this gap significant for several reasons:

  1. Audio is the dominant edge modality. Billions of devices (earbuds, smart speakers, hearing aids, vehicles, MCUs) rely on real-time audio processing β€” often under tighter latency, memory, and power constraints than vision or text.
  2. The same efficiency playbook applies. Every core technique taught in Chapters I–III (pruning, quantization, NAS, knowledge distillation, distributed training) transfers directly to audio models β€” yet the domain has unique challenges (long sequences, streaming requirements, sample-rate bottlenecks) that deserve dedicated coverage.
  3. Rapid convergence with LLMs. Audio-Language Models (ALMs) are following the exact same trajectory as Vision-Language Models (VLMs): frozen encoder β†’ bridge module β†’ LLM backbone. Understanding this pattern across all three modalities gives a complete picture of efficient multimodal AI at the edge.

Structure

These notes follow Professor Song Han's pedagogical framework: start from the breakthrough model that defined the domain, scale it up, then systematically compress it back down for deployment using the course's core techniques.

Audio Lecture Parallel to Breakthrough Paper
LA1 β€” Audio Transformers & ASR L12–L13 (Transformer β†’ LLM Deployment) wav2vec 2.0 (Baevski et al., 2020)
LA2 β€” Speech Synthesis & Audio Generation L17–L18 (GAN β†’ Diffusion) WaveNet (van den Oord et al., 2016)
LA3 β€” Audio-Language Models & Sound Understanding L12 Multimodal + L16 (ViT) CLAP (Elizalde et al., 2023)

πŸ’» Hands-on Labs & Advanced Project Ideas (MIT 6.5940 Final Projects)

All lab exercises are designed to provide hands-on experience with real-world frameworks:

  • LLM Deployment: Hands-on experience deploying and running QLoRA-tuned LLMs (e.g., Llama-2) directly on a local GPU or CPU.
  • TinyML: Utilizing the TinyEngine and TensorFlow Lite Micro frameworks for model deployment on simulated microcontroller environments.
  • QML: Using Qiskit and Pennylane to build, train, and mitigate noise in variational quantum circuits.

For further advanced projects the course provided a set of state-of-the-art research challenges in efficient ML to explore.

1. Project: TSM for Efficient Video Understanding (Temporal Shift Module)

  • Goal: Address the challenge of efficient video analysis by leveraging Temporal Shift Module (TSM), which captures temporal relationships without adding computational cost[cite: 8, 10].
  • Description: TSM works by shifting part of the channels along the temporal dimension, facilitating information exchange among neighboring frames[cite: 9]. Projects could involve changing the backbone (e.g., from MobileNetV2) or applying TSM to a new video task like fall detection[cite: 14, 15].

2. Project: SIGE - Sparse Engine for Generative AI

  • Goal: Accelerate image editing in deep generative models by avoiding the re-synthesis of unedited regions[cite: 34, 35].
  • Description: SIGE (Sparse Inference GEnerator) is a sparse engine that caches and reuses feature maps from the original image to generate only the edited regions[cite: 36]. The project focuses on integrating SIGE with Stable Diffusion XL (SDXL) to assess and potentially achieve more significant speed improvements[cite: 37, 39].

3. Project: QServe for Online Quantized LLM Serving

  • Goal: Achieve high-throughput, real-time serving of low-precision quantized LLMs (like INT4) in cloud-based settings[cite: 165, 182].
  • Description: The project centers on implementing an online, real-time serving system using the QServe library, which utilizes the QoQ (W4A8KV4) quantization algorithm. The final objective is to build an online Gradio demo to serve these highly-efficient, quantized LLMs[cite: 168, 183, 185].

Full documentation and project details can be found here.

References

⚠️ Disclaimer

Important

The Audio extension notes (LA1–LA3) are not official course material β€” they are community research notes that I designed by applying Professor Han's pedagogical framework and the course's core efficiency principles to the audio domain. All credit for the teaching methodology, structure, and foundational techniques goes to him and the MIT HAN Lab team.

πŸ™ Acknowledgements

Special thanks to:

About

Notes and Labs from MIT 6.5940, (Fall 2023) lecture.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors