This project tackles the MAP โ Charting Student Math Misunderstandings Kaggle competition.
The goal is to classify mathematical misconceptions from open-ended student explanations using a transformer-based deep learning model.
Students explain their reasoning in free-text format.
The task is to predict:
Category : Misconception
based on:
- Question text
- Multiple-choice answer
- Student explanation
This is a multi-class NLP classification problem with strong class imbalance.
Input Text
(Question + MC Answer + Student Explanation)
โ Tokenization (DeBERTa-v3)
โ Transformer Encoder
โ Classification Head
โ Softmax
โ Category:Misconception Prediction
- โ DeBERTa-v3 Backbone
- โ Stratified K-Fold Cross Validation (3 Folds)
- โ Mixed Precision Training (AMP)
- โ Cosine Learning Rate Scheduler
- โ AdamW Optimizer
- โ Layer Freezing for Faster Training
- โ Fold Ensembling for Final Submission
- โ Macro F1 Evaluation
Primary Metric: Macro F1 Score
- Stratified splitting ensures label balance
- Validation performance tracked per fold
- Best fold model saved and used for ensembling
- PyTorch
- Hugging Face Transformers
- Scikit-learn
- Pandas / NumPy
- Google Colab (GPU T4)
Colab Notebook: (https://colab.research.google.com/drive/1p7GqShMU9kcon3isXY7xAhMCfBCfrcqu?usp=sharing)
- Handling extreme class imbalance in NLP classification
- Implementing stratified K-Fold validation
- Managing transformer training stability
- Debugging mixed precision & gradient instability
- Designing clean Kaggle submission pipelines
- Use DeBERTa-v3-Large
- Apply label smoothing
- Try class-balanced loss
- Apply pseudo-labeling
- Add model distillation for faster inference