This repository contains code for fine-tuning a BERT model to detect AI-generated text. The model can classify text as either 'student' generated or 'AI' generated.
- Python 3.12
- Jupyter Notebook
- CUDA-enabled GPU (optional but recommended for faster training)
- Required Python packages (listed in
requirements.txt)
-
Clone the repository:
git clone https://github.com/HARSHDIPSAHA/AI-generated-text-detector.git cd AI-generated-text-detector -
Create a virtual environment:
python -m venv myenv source myenv/bin/activate # On Windows, use `myenv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Launch Jupyter Notebook:
jupyter notebook
-
Open
berttttt.ipynbin Jupyter Notebook. -
Run all cells to train the model and make predictions.
If you don't want to run Jupyter Notebooks, you can use Git LFS to download the pre-trained model and tokenizer:
-
Install Git LFS:
git lfs install
-
Clone the repository with Git LFS:
git lfs clone https://github.com/HARSHDIPSAHA/AI-generated-text-detector.git cd AI-generated-text-detector -
Copy the
bert_finetuned_modelandbert_tokenizerdirectories to your local machine. -
Create a new Jupyter Notebook and load the model:
import torch from transformers import BertTokenizer, BertForSequenceClassification def load_model_and_tokenizer(model_path, tokenizer_path): model = BertForSequenceClassification.from_pretrained(model_path) tokenizer = BertTokenizer.from_pretrained(tokenizer_path) return model, tokenizer model_path = './bert_finetuned_model' tokenizer_path = './bert_tokenizer' model, tokenizer = load_model_and_tokenizer(model_path, tokenizer_path) labels = {0: "student", 1: "ai"} def predict_text_category(dialogue, model, tokenizer): inputs = tokenizer(dialogue, return_tensors='pt', truncation=True, padding=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits predicted_class = torch.argmax(logits, dim=1).item() return labels[predicted_class] text = "Blockchain is revolutionizing education, providing personalized learning experiences to students all over the world." predicted_label = predict_text_category(text, model, tokenizer) print(f"The predicted label for the given text is: {predicted_label}") (1)
text = "Let us eat together any burger."
predicted_label = predict_text_category(text, model, tokenizer)
print(f"The predicted label for the given text is: {predicted_label}")
(0)