Transformer Autoregressive Language Model

This project features an autoregressive Transformer-based language model developed for research and educational purposes. Drawing inspiration from state-of-the-art models, it serves as a hands-on tool for exploring advanced natural language processing techniques.

Usage

1. Pre-training Phase

Purpose: Learn general language representations from a large, unlabeled corpus
Process:
- Model learns basic linguistic patterns and knowledge
- Uses unsupervised learning on raw text data
- Builds foundational understanding of language structure
Execute: To run the pre-training phase, execute:
```
python pretrain.py
```

2. Instruction Fine-tuning Phase

Purpose: Adapt the model to follow specific instructions and improve task performance
Process:
- Train on carefully curated instruction-based datasets
- Teaches model to understand and execute precise user directives
- Enhances model's ability to generate contextually appropriate responses
- Improves zero-shot and few-shot learning capabilities
Execute: To run the instruct-train phase, execute:
```
python instruct_train.py
```

3. Inference Phase

Purpose: Deploy the trained model to generate text or solve specific tasks
Process:
- Load pre-trained and fine-tuned model weights
- Accept user prompts or input
- Generate contextually relevant and instruction-aligned outputs
Execute: To test the model, execute:
```
python inference.py
```

Configuration

All model and training parameters are centralized in the config.py file. In this file, you can adjust settings to control various aspects of the model and its training process:

Model Parameters:

d_model: Dimension of token embeddings and internal representations
nhead: Number of attention heads in the multi-head attention mechanism
num_layers: Number of Transformer layers composing the model
max_seq_len: Maximum length of input sequences (in tokens)
dropout: Dropout rate for regularization

Training Parameters:

learning_rate: Learning rate for the optimizer
num_epochs_pretrain: Number of epochs for unsupervised pre-training
num_epochs_instruct: Number of epochs for instruction-based fine-tuning
batch_size: Number of examples processed per training batch

Dataset and File Paths:

Paths for datasets, vocabulary files (e.g., encoder.json, vocab.bpe), and directories where checkpoints and trained models are saved are also configurable.

Adjust these parameters in config.py to customize the model's behavior and training procedure according to your research or study needs.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
Readme.md		Readme.md
colab-training.ipynb		colab-training.ipynb
config.py		config.py
dataset.py		dataset.py
encoder.json		encoder.json
inference.py		inference.py
instruct_train.py		instruct_train.py
model.py		model.py
pretrain.py		pretrain.py
requirements.txt		requirements.txt
tokenizer.py		tokenizer.py
transformer-model-readme.md		transformer-model-readme.md
vocab.bpe		vocab.bpe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Autoregressive Language Model

Usage

1. Pre-training Phase

2. Instruction Fine-tuning Phase

3. Inference Phase

Configuration

Model Parameters:

Training Parameters:

Dataset and File Paths:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transformer Autoregressive Language Model

Usage

1. Pre-training Phase

2. Instruction Fine-tuning Phase

3. Inference Phase

Configuration

Model Parameters:

Training Parameters:

Dataset and File Paths:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages