This project uses a custom BERT model to classify synthetic FIR texts into crime categories.
- Synthetic FIR data with labels like 'Assault', 'Fraud', 'Cybercrime', etc.
- Stored in
data/fir_dataset.csv
- Fine-tunes
bert-base-uncasedusing HuggingFace'sTrainer.
pip install -r requirements.txt
python train.py- The dataset is synthetically generated and not from any real-world source.
- Accuracy is currently low — feel free to experiment and improve!
You can also run the training pipeline interactively in fir_classifier_notebook.ipynb.