AI-Generated Text Detector

This repository contains code for fine-tuning a BERT model to detect AI-generated text. The model can classify text as either 'student' generated or 'AI' generated.

Setup Instructions

Prerequisites

Python 3.12
Jupyter Notebook
CUDA-enabled GPU (optional but recommended for faster training)
Required Python packages (listed in requirements.txt)

Installation

Clone the repository:

git clone https://github.com/HARSHDIPSAHA/AI-generated-text-detector.git
cd AI-generated-text-detector

Create a virtual environment:

python -m venv myenv
source myenv/bin/activate   # On Windows, use `myenv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Running the Notebook

Launch Jupyter Notebook:
```
jupyter notebook
```
Open berttttt.ipynb in Jupyter Notebook.
Run all cells to train the model and make predictions.

Using Git LFS

If you don't want to run Jupyter Notebooks, you can use Git LFS to download the pre-trained model and tokenizer:

Install Git LFS:
```
git lfs install
```

Clone the repository with Git LFS:

git lfs clone https://github.com/HARSHDIPSAHA/AI-generated-text-detector.git
cd AI-generated-text-detector

Copy the bert_finetuned_model and bert_tokenizer directories to your local machine.

Create a new Jupyter Notebook and load the model:

import torch
from transformers import BertTokenizer, BertForSequenceClassification

def load_model_and_tokenizer(model_path, tokenizer_path):
    model = BertForSequenceClassification.from_pretrained(model_path)
    tokenizer = BertTokenizer.from_pretrained(tokenizer_path)
    return model, tokenizer

model_path = './bert_finetuned_model'
tokenizer_path = './bert_tokenizer'
model, tokenizer = load_model_and_tokenizer(model_path, tokenizer_path)

labels = {0: "student", 1: "ai"}

def predict_text_category(dialogue, model, tokenizer):
    inputs = tokenizer(dialogue, return_tensors='pt', truncation=True, padding=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=1).item()
    return labels[predicted_class]

text = "Blockchain is revolutionizing education, providing personalized learning experiences to students all over the world."
predicted_label = predict_text_category(text, model, tokenizer)
print(f"The predicted label for the given text is: {predicted_label}")
(1)

Example Predictions

text = "Let us eat together any burger."
predicted_label = predict_text_category(text, model, tokenizer)
print(f"The predicted label for the given text is: {predicted_label}")
(0)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
bert_fintuned_model		bert_fintuned_model
bert_tokenizer		bert_tokenizer
.gitattributes		.gitattributes
.gitignore		.gitignore
LLM.csv		LLM.csv
README.md		README.md
berttttt.ipynb		berttttt.ipynb
data_augmentation_ideas.py		data_augmentation_ideas.py
requirements.txt		requirements.txt
train_essays.csv		train_essays.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Generated Text Detector

Setup Instructions

Prerequisites

Installation

Running the Notebook

Using Git LFS

Example Predictions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI-Generated Text Detector

Setup Instructions

Prerequisites

Installation

Running the Notebook

Using Git LFS

Example Predictions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages