A full PyTorch implementation of the Transformer architecture from the groundbreaking paper Attention Is All You Need, trained for machine translation from English → Italian.
This project aims to replicate the original transformer architecture from scratch - without using high-level libraries like HuggingFace or OpenNMT to deeply understand each component: multi-head attention, positional encoding, residual connections, encoder-decoder stacks, and more.
- 📚 Dataset: Helsinki-NLP/opus_books
- 🔡 Task: Translate English sentences into Italian
- ⚙️ Model: Custom implementation of the transformer model (encoder-decoder with attention)
- 📦 Tokenizer: Trained from scratch using HuggingFace
tokenizerslibrary (WordLevel + Whitespace) - 📊 Evaluation: Character Error Rate (CER), Word Error Rate (WER), BLEU Score
- 🧪 Training: Uses PyTorch
DataLoader,SummaryWriter, and gradient optimization with Adam
├── config.py # Training hyperparameters & weight paths
├── dataset.py # Data preprocessing, padding, masks, batching
├── model.py # Transformer architecture from scratch
├── train.py # Training loop, validation, model saving
├── translate.py # Script to use the trained model to test translation inference
├── transformer.png # Diagram of transformer architecture
# Generated Files/Directories
├── tokenizer_en.json # Generated tokenizer for English
├── tokenizer_it.json # Generated tokenizer for Italian
├── [data_source]_weights/ # Saved model checkpoints per epoch
├── runs/t_model/ # Each experiment ran when ran the train script- Install dependencies
pip install -r requirements.txtOr if you want to install torch with CUDA:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121- Run the training
python train.py
- View logs with TensorBoard
tensorboard --logdir=runs/tmodel/[experiment_filename] --host=localhost
After training the model, you can use the translate.py script to perform translations:
Translate a custom English sentence to Italian:
python translate.py "Hello, how are you?"Translate a sentence from the test dataset by providing its index:
python translate.py 42Run without arguments to translate the default sentence:
python translate.pyUsing device: cuda
SOURCE: Hello, how are you?
PREDICTED: Ciao, come stai?
Note: Make sure you have trained the model first using python train.py before running translations.
