Skip to content

IshankumarP/Minibert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

🧠 MiniBERT: A Small-Scale Implementation and Fine-tuning of BERT

A compact and educational replication of the original BERT model (Devlin et al., 2018), MiniBERT is designed for practical understanding and experimentation under constrained compute environments. This project pre-trains a simplified BERT architecture on ~30M tokens and fine-tunes it on two NLP tasks: SST-2 for sentiment classification and SWAG for commonsense inference.


📖 Overview

"Pre-train on unlabeled data, fine-tune on everything else."

MiniBERT recreates the BERT pipeline — from scratch — with:

  • 4 Transformer layers
  • Hidden size of 256
  • 4 attention heads
  • Max sequence length: 128
  • Pretraining on BookCorpus + Wikipedia (trimmed)
  • Fine-tuning on SST-2 & SWAG

This makes it an ideal reference for anyone learning about Transformers, BERT architecture, or resource-efficient deep learning.


🏗️ Model Architecture

Component MiniBERT BERT-Base
Layers 4 12
Hidden Size 256 768
Attention Heads 4 12
Seq Length 128 512
Vocabulary Size 30,522 30,522

Built in PyTorch, with inspiration from HuggingFace's transformers library.


🧪 Pre-training Tasks

1. Masked Language Modeling (MLM)

  • 15% of tokens masked (80% [MASK], 10% random, 10% unchanged)
  • Loss calculated only on masked tokens

2. Next Sentence Prediction (NSP)

  • 50% IsNext (same document), 50% NotNext (random pairing)
  • Input format: [CLS] Sentence A [SEP] Sentence B [SEP]

📊 Fine-tuning & Results

✅ SST-2: Sentiment Classification

  • Binary classification
  • 87.2% validation accuracy

❌ SWAG: Commonsense Inference

  • 4-choice multiple choice
  • 37.3% validation accuracy (limited by compute/resources)

🛠️ Tech Stack

  • Python
  • PyTorch
  • Hugging Face transformers
  • Google Colab (T4 GPU)
  • Datasets: BookCorpus, Wikipedia, SST-2, SWAG

🚀 How to Run

  1. Clone the repo and install dependencies:
pip install -r requirements.txt
  1. Open the notebook:
jupyter notebook DLbertcode.ipynb
  1. Run each section:
    • Preprocessing + Dataset loading
    • Model implementation
    • Pre-training (MLM + NSP)
    • Fine-tuning on SST-2 & SWAG

📁 Project Structure

File Description
DLbertcode.ipynb Main notebook with full implementation
DLbert.pdf Project report (background, results)
requirements.txt Dependencies list
models/ (Optional) Saved checkpoints
data/ (Optional) Preprocessed datasets

📌 Limitations

  • Small model capacity
  • Limited compute (1 GPU, 3 epochs)
  • Restricted token count (~30M vs. BERT's 3.3B+)
  • Only 2 downstream tasks evaluated

📚 References

  • Devlin et al., 2018 — BERT: Pre-training of Deep Bidirectional Transformers
  • Vaswani et al., 2017 — Attention is All You Need
  • Zellers et al., 2018 — SWAG Dataset
  • Socher et al., 2013 — SST-2 Dataset

🙏 Acknowledgments

Thanks to the contributors and the department of CSE(AIML) for their support and guidance.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors