A deep learning project that detects fake vs real news articles using a Transformer-based text classification model built from scratch in PyTorch.
Unlike many NLP projects that rely on prebuilt architectures, this implementation constructs the core Transformer components manually, including:
- Token + Positional Embeddings
- Multi-head self-attention
- Scaled dot-product attention
- Feedforward neural networks
- Encoder stacks
- A classification head for binary prediction
The model learns contextual relationships between words in a news article to determine whether the article is real or fake.
Fake news spreads rapidly across social media and digital platforms. Traditional NLP methods struggle to capture long-range dependencies between words in articles.
This project solves that problem using the Transformer architecture, which allows every token in a sequence to attend to every other token.
The system processes news text and predicts whether the article belongs to one of two classes:
- 0 β Real News
- 1 β Fake News
The model follows a Transformer Encoder classification pipeline:
Raw Text
β
Tokenization
β
Vocabulary Encoding
β
Token + Positional Embedding
β
Transformer Encoder Blocks (Self Attention)
β
Sentence Representation
β
Feedforward Classification Head
β
Fake / Real Prediction
The text is cleaned and tokenized using whitespace-based tokenization.
"The economy is growing fast"
β ["The", "economy", "is", "growing", "fast"]
A vocabulary is built from the dataset using frequency statistics.
Special tokens:
| Token | Purpose |
|---|---|
<PAD> |
Padding shorter sequences |
<UNK> |
Unknown words |
Parameters used:
max_vocab_size = 10,000
min_frequency = 2
Each article is converted into a fixed-length sequence.
Max sequence length = 512 tokens
Shorter texts are padded.
Transformers cannot understand word order by default.
Therefore the model combines:
Token Embedding
+
Positional Embedding
This gives the model information about:
- word meaning
- word position
Output shape:
[batch_size, sequence_length, embedding_dimension]
The core of the transformer.
The model computes attention using:
Attention(Q,K,V) = softmax(QKα΅ / βd_k) V
Where:
| Symbol | Meaning |
|---|---|
| Q | Query |
| K | Key |
| V | Value |
| d_k | dimension of key vectors |
This allows each word to focus on relevant words across the sentence.
Instead of using a single attention mechanism, the model splits attention into multiple heads.
Example:
embed_dim = 128
num_heads = 4
head_dim = 32
Each head learns different linguistic relationships such as:
- grammar
- context
- topic relevance
- semantic similarity
Each encoder block consists of:
1οΈβ£ Multi-head self attention 2οΈβ£ Residual connection 3οΈβ£ Layer normalization 4οΈβ£ Feedforward neural network
Structure:
Input
β
Self Attention
β
Add & Normalize
β
Feedforward Network
β
Add & Normalize
β
Output
Multiple encoder blocks are stacked to build deeper representations.
After the transformer encoding stage, a sentence representation is extracted and passed through a classifier.
Architecture:
Linear Layer
ReLU Activation
Dropout
Linear Output Layer
Final output:
[batch_size, 2]
representing probabilities for:
- Real News
- Fake News
| Parameter | Value |
|---|---|
| Embedding Dimension | 128 |
| Number of Heads | 4 |
| Encoder Layers | 2 |
| Feedforward Hidden Size | 256 |
| Max Sequence Length | 512 |
| Batch Size | 32 |
| Epochs | 5 |
| Optimizer | Adam |
| Learning Rate | 0.001 |
| Loss Function | CrossEntropyLoss |
Training follows the standard deep learning workflow:
- Load dataset
- Build vocabulary
- Encode text sequences
- Create PyTorch datasets
- Train transformer model
- Evaluate predictions
Training loop performs:
Forward pass
Loss calculation
Backward propagation
Optimizer step
Accuracy calculation
Custom PyTorch dataset:
NewsDataset
DataLoader handles:
- batching
- shuffling
- efficient GPU training
The trained model is saved using PyTorch:
torch.save(model.state_dict(), "transformer_model.pth")
The trained model can be deployed using Streamlit to build an interactive web application where users can paste news text and receive predictions instantly.
Input:
Breaking: Government confirms new economic policy changes today
Output:
Prediction: Real News
Confidence: 92%
Possible enhancements:
- Add pretrained embeddings (GloVe / FastText)
- Add CLS token representation
- Increase encoder depth
- Use larger datasets
- Integrate pretrained transformer models
- Python
- PyTorch
- NumPy
- Scikit-Learn
- Matplotlib
- Streamlit
Aryan Sharma
Software Engineering Student AI & Machine Learning Enthusiast
Pull requests and suggestions are welcome.
MIT License