MarkGPT: From Zero to Training Your Own LLM

A 60-Day Comprehensive Curriculum in Large Language Models

"The journey of a thousand tokens begins with a single character." — Inspired by 45+ years of curriculum design at Stanford, Harvard, and MIT

🌍 What Is This Course?

This is not just a tutorial. This is a complete educational journey — structured like a university textbook, taught like a graduate seminar, and built with one magnificent goal in mind:

By Day 60, you will have trained your own language model — MarkGPT — on the Bible, tuned to understand and generate text in the Banso vernacular dialect of Cameroon.

Whether you are a complete beginner who has never written a line of Python, or an intermediate developer who wants to deeply understand how transformers work from the mathematical ground up, this curriculum meets you where you are and carries you forward.

🗺️ The Journey at a Glance

Phase	Days	Module	Theme
Foundation	1–6	Module 01	What is AI? What is Language?
Foundation	7–12	Module 02	Python & Math Essentials for ML
Intermediate	13–18	Module 03	Neural Networks from Scratch
Intermediate	19–24	Module 04	Deep Learning & Backpropagation
Intermediate	25–30	Module 05	NLP Foundations — Text as Data
Advanced	31–36	Module 06	The Transformer Architecture
Advanced	37–42	Module 07	Training Large Language Models
Advanced	43–48	Module 08	Fine-Tuning & Transfer Learning
Capstone	49–54	Module 09	Building the Banso Dataset
Capstone	55–60	Module 10	Training & Deploying MarkGPT

🏗️ Repository Structure

MarkGPT-LLM-Curriculum/
│
├── README.md                    ← You are here
├── SYLLABUS.md                  ← Full day-by-day syllabus
├── GETTING_STARTED.md           ← Setup guide for all platforms
├── PHILOSOPHY.md                ← The pedagogical approach of this course
│
├── modules/                     ← 10 learning modules (the "textbook")
│   ├── module-01/               ← Foundations of AI & Language
│   ├── module-02/               ← Python & Mathematics
│   ├── module-03/               ← Neural Networks
│   ├── module-04/               ← Deep Learning
│   ├── module-05/               ← NLP Foundations
│   ├── module-06/               ← Transformer Architecture
│   ├── module-07/               ← LLM Training
│   ├── module-08/               ← Fine-Tuning
│   ├── module-09/               ← Banso Dataset Construction
│   └── module-10/               ← MarkGPT Training & Deployment
│
├── src/                         ← Production-quality source code
│   ├── tokenizer/               ← Custom Banso tokenizer
│   ├── model/                   ← MarkGPT model architecture
│   ├── training/                ← Training loops and schedulers
│   ├── inference/               ← Text generation code
│   └── utils/                   ← Shared utilities
│
├── data/
│   ├── raw/                     ← Raw Bible text sources
│   ├── processed/               ← Cleaned, tokenized data
│   └── banso-vernacular/        ← Banso language resources
│
├── notebooks/                   ← Jupyter notebooks for exploration
├── configs/                     ← YAML config files for training runs
├── capstone/                    ← Final MarkGPT project files
├── checkpoints/                 ← Saved model weights (gitignored)
├── tests/                       ← Unit tests for all src code
└── docs/                        ← Extended documentation

🧭 How to Use This Repository

For Complete Beginners (Days 1–12)

Start with GETTING_STARTED.md, then work through Module 01 lesson by lesson. Do not skip the exercises — they are carefully designed to build intuition before code. Every concept is explained from first principles.

For Intermediate Learners (Days 13–30)

You can move more quickly through Modules 01–02, but do not skip the math foundations in Module 02. The neural network derivations in Modules 03–04 are the backbone of everything that follows.

For Advanced Learners (Days 31–60)

You may begin at Module 05 or 06. The Transformer implementation in Module 06 is written from scratch in PyTorch — no Hugging Face shortcuts until you've understood every matrix multiplication.

🎯 The Capstone: MarkGPT

MarkGPT is a small GPT-style language model (~10M–85M parameters, depending on your hardware) trained on:

The Bible (King James Version and other translations) as the primary corpus
Banso vernacular text — proverbs, oral literature, and translated passages in the Banso dialect spoken in the North West Region of Cameroon

The result is a model that can generate text with the rhythm, vocabulary, and cadence of Banso-inflected Biblical language. This is not just a technical achievement — it is a cultural preservation project.

💻 Hardware Requirements

Level	Hardware	What You Can Train
Minimum	CPU only (any laptop)	MarkGPT-Nano (2M params)
Recommended	GPU with 6GB+ VRAM	MarkGPT-Small (10M params)
Ideal	GPU with 16GB+ VRAM	MarkGPT-Base (85M params)
Cloud	Google Colab / Kaggle (free)	MarkGPT-Small (free tier)

📚 Prerequisites

None. Truly. If you can read this sentence, you can begin Day 1.

By Day 12, you will have the Python and mathematics you need. By Day 30, you will understand neural networks deeply. By Day 60, you will have trained your own LLM.

🧑‍🏫 A Note from the Curriculum Author

I have spent over four decades teaching machine learning, computational linguistics, and AI at institutions including Stanford, Harvard, and MIT. In that time, I have seen one pattern repeat itself endlessly: students who rush to copy-paste code without understanding the foundations always hit a wall. The ones who take the time to understand why backpropagation works, why attention is computed the way it is, why tokenization choices matter — those students become builders, not just users.

This curriculum is designed to produce builders. Every exercise, every derivation, every project is chosen to deepen intuition, not just skill. The Banso MarkGPT capstone is not a gimmick — it is a reminder that language models carry within them the languages, cultures, and voices of real people. Build with care.

🤝 How to Contribute

We warmly welcome contributions from the community! Whether you're a beginner student, experienced educator, or researcher, there are many ways to improve this curriculum.

📚 What Can You Contribute?

Improve Lessons: Clarify explanations, add visualizations, fix typos
Create Exercises: Build challenging practice problems with solutions
Build Projects: Create real-world applications and mini-capstones
Expand Documentation: Write guides, troubleshooting docs, tips
Language & Culture: Expand the Banso dataset, improve translations
Bug Fixes: Report and fix errors in code or explanations
Data Contributions: Help with the Banso vernacular dataset
Deployment Guides: Create deployment examples and best practices

🚀 For Student Contributors

This is the perfect project to build your portfolio! New to open source? Start here:

Read contributors/CONTRIBUTORS_GUIDE.md — Complete step-by-step guide for beginners
Create your contributor folder with your GitHub username in contributors/
Work on improvements in your own isolated folder
Submit a Pull Request when you're ready
Collaborate with maintainers and other students

The contributors guide includes:

✅ Complete setup instructions (no prior Git knowledge required)
✅ Step-by-step contribution workflow
✅ Best practices and common pitfalls to avoid
✅ Clear examples of what makes a good contribution
✅ Troubleshooting help for common issues

🌍 For Educators & Researchers

Adapt this curriculum for classroom use
Contribute research insights and papers
Help with Banso language/cultural accuracy
Share datasets and resources for African language NLP
Collaborate on model deployment and optimization

📋 Contribution Process

Quick Reference (detailed guide in contributors/CONTRIBUTORS_GUIDE.md):

# 1. Fork the repository on GitHub
# 2. Clone your fork
git clone https://github.com/YOUR-USERNAME/MarkGPT-LLM-Curriculum.git

# 3. Create your contributor workspace
mkdir contributors/YOUR-GITHUB-USERNAME
cd contributors/YOUR-GITHUB-USERNAME

# 4. Create the module structure you're working on
mkdir -p module-02/lessons

# 5. Make your changes and commit
git add .
git commit -m "feat: improve module-02 numpy lesson"

# 6. Push to your fork and create a Pull Request
git push origin main

✨ Best Practices

Start Small: First contributions should be focused (1-2 files)
Test Everything: Run all Jupyter notebooks before submitting
Follow Patterns: Look at existing lessons for structure and style
Write Clear Commits: Describe what changed and why
Communicate: Ask questions if anything is unclear
Respect Culture: Be sensitive to Banso language and traditions

💡 Getting Help

Setup Issues? Check GETTING_STARTED.md
Contribution Questions? See contributors/CONTRIBUTORS_GUIDE.md
Git Problems? Check the Git Troubleshooting section
Need More Help? Open an issue or email iwstechnical@gmail.com

🎯 Special Opportunities

We are currently seeking contributors for:

Banso Language Experts: Help expand vernacular dataset and improve linguistic accuracy
Data Collection: Build tools and workflows for dataset creation
Model Evaluation: Create evaluation metrics and benchmarks
Deployment: Build Docker containers and deployment guides
Documentation: Write case studies, tutorials, and explanations
Translations: Adapt lessons for other languages/contexts

📄 Contributor Guidelines

For detailed guidelines, including code standards, docstring requirements, and the full contribution workflow, see:

CONTRIBUTING.md — Technical contribution standards
contributors/CONTRIBUTORS_GUIDE.md — Step-by-step guide for beginners
COMMUNITY_GUIDELINES.md — Code of conduct and community values

🌟 Recognition

All contributors are recognized in:

This README (in a Contributors section)
The CONTRIBUTORS.md file in the repository
Each Pull Request and commit history
Our community page

📄 License

This curriculum is released under the Creative Commons Attribution 4.0 International License. The MarkGPT source code is under the MIT License. See LICENSE.md for details.

"Every language is a universe. MarkGPT is our attempt to let one more universe speak."

Name		Name	Last commit message	Last commit date
Latest commit History 4,685 Commits
.github		.github
assets		assets
capstone		capstone
configs		configs
contributors		contributors
data/banso-vernacular		data/banso-vernacular
docs		docs
modules		modules
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
BEST_PRACTICES.md		BEST_PRACTICES.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
CONTRIBUTOR_EXAMPLES.md		CONTRIBUTOR_EXAMPLES.md
CURRICULUM_RESTRUCTURING_FINAL.md		CURRICULUM_RESTRUCTURING_FINAL.md
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE.md		LICENSE.md
MODULE_02_ENRICHMENT_SUMMARY.txt		MODULE_02_ENRICHMENT_SUMMARY.txt
MODULE_03_ENRICHMENT_SUMMARY.txt		MODULE_03_ENRICHMENT_SUMMARY.txt
MODULE_1_1_COMPLETION_REPORT.md		MODULE_1_1_COMPLETION_REPORT.md
Makefile		Makefile
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
REORGANIZATION_COMPLETE.md		REORGANIZATION_COMPLETE.md
RESOURCE_EXPANSION_SUMMARY.md		RESOURCE_EXPANSION_SUMMARY.md
SECURITY.md		SECURITY.md
SYLLABUS.md		SYLLABUS.md
VERSION_HISTORY.md		VERSION_HISTORY.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
test_plot.png		test_plot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MarkGPT: From Zero to Training Your Own LLM

A 60-Day Comprehensive Curriculum in Large Language Models

🌍 What Is This Course?

🗺️ The Journey at a Glance

🏗️ Repository Structure

🧭 How to Use This Repository

For Complete Beginners (Days 1–12)

For Intermediate Learners (Days 13–30)

For Advanced Learners (Days 31–60)

🎯 The Capstone: MarkGPT

💻 Hardware Requirements

📚 Prerequisites

🧑‍🏫 A Note from the Curriculum Author

🤝 How to Contribute

📚 What Can You Contribute?

🚀 For Student Contributors

🌍 For Educators & Researchers

📋 Contribution Process

✨ Best Practices

💡 Getting Help

🎯 Special Opportunities

📄 Contributor Guidelines

🌟 Recognition

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MarkGPT: From Zero to Training Your Own LLM

A 60-Day Comprehensive Curriculum in Large Language Models

🌍 What Is This Course?

🗺️ The Journey at a Glance

🏗️ Repository Structure

🧭 How to Use This Repository

For Complete Beginners (Days 1–12)

For Intermediate Learners (Days 13–30)

For Advanced Learners (Days 31–60)

🎯 The Capstone: MarkGPT

💻 Hardware Requirements

📚 Prerequisites

🧑‍🏫 A Note from the Curriculum Author

🤝 How to Contribute

📚 What Can You Contribute?

🚀 For Student Contributors

🌍 For Educators & Researchers

📋 Contribution Process

✨ Best Practices

💡 Getting Help

🎯 Special Opportunities

📄 Contributor Guidelines

🌟 Recognition

📄 License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages