"The journey of a thousand tokens begins with a single character." — Inspired by 45+ years of curriculum design at Stanford, Harvard, and MIT
This is not just a tutorial. This is a complete educational journey — structured like a university textbook, taught like a graduate seminar, and built with one magnificent goal in mind:
By Day 60, you will have trained your own language model — MarkGPT — on the Bible, tuned to understand and generate text in the Banso vernacular dialect of Cameroon.
Whether you are a complete beginner who has never written a line of Python, or an intermediate developer who wants to deeply understand how transformers work from the mathematical ground up, this curriculum meets you where you are and carries you forward.
| Phase | Days | Module | Theme |
|---|---|---|---|
| Foundation | 1–6 | Module 01 | What is AI? What is Language? |
| Foundation | 7–12 | Module 02 | Python & Math Essentials for ML |
| Intermediate | 13–18 | Module 03 | Neural Networks from Scratch |
| Intermediate | 19–24 | Module 04 | Deep Learning & Backpropagation |
| Intermediate | 25–30 | Module 05 | NLP Foundations — Text as Data |
| Advanced | 31–36 | Module 06 | The Transformer Architecture |
| Advanced | 37–42 | Module 07 | Training Large Language Models |
| Advanced | 43–48 | Module 08 | Fine-Tuning & Transfer Learning |
| Capstone | 49–54 | Module 09 | Building the Banso Dataset |
| Capstone | 55–60 | Module 10 | Training & Deploying MarkGPT |
MarkGPT-LLM-Curriculum/
│
├── README.md ← You are here
├── SYLLABUS.md ← Full day-by-day syllabus
├── GETTING_STARTED.md ← Setup guide for all platforms
├── PHILOSOPHY.md ← The pedagogical approach of this course
│
├── modules/ ← 10 learning modules (the "textbook")
│ ├── module-01/ ← Foundations of AI & Language
│ ├── module-02/ ← Python & Mathematics
│ ├── module-03/ ← Neural Networks
│ ├── module-04/ ← Deep Learning
│ ├── module-05/ ← NLP Foundations
│ ├── module-06/ ← Transformer Architecture
│ ├── module-07/ ← LLM Training
│ ├── module-08/ ← Fine-Tuning
│ ├── module-09/ ← Banso Dataset Construction
│ └── module-10/ ← MarkGPT Training & Deployment
│
├── src/ ← Production-quality source code
│ ├── tokenizer/ ← Custom Banso tokenizer
│ ├── model/ ← MarkGPT model architecture
│ ├── training/ ← Training loops and schedulers
│ ├── inference/ ← Text generation code
│ └── utils/ ← Shared utilities
│
├── data/
│ ├── raw/ ← Raw Bible text sources
│ ├── processed/ ← Cleaned, tokenized data
│ └── banso-vernacular/ ← Banso language resources
│
├── notebooks/ ← Jupyter notebooks for exploration
├── configs/ ← YAML config files for training runs
├── capstone/ ← Final MarkGPT project files
├── checkpoints/ ← Saved model weights (gitignored)
├── tests/ ← Unit tests for all src code
└── docs/ ← Extended documentation
Start with GETTING_STARTED.md, then work through Module 01 lesson by lesson. Do not skip the exercises — they are carefully designed to build intuition before code. Every concept is explained from first principles.
You can move more quickly through Modules 01–02, but do not skip the math foundations in Module 02. The neural network derivations in Modules 03–04 are the backbone of everything that follows.
You may begin at Module 05 or 06. The Transformer implementation in Module 06 is written from scratch in PyTorch — no Hugging Face shortcuts until you've understood every matrix multiplication.
MarkGPT is a small GPT-style language model (~10M–85M parameters, depending on your hardware) trained on:
- The Bible (King James Version and other translations) as the primary corpus
- Banso vernacular text — proverbs, oral literature, and translated passages in the Banso dialect spoken in the North West Region of Cameroon
The result is a model that can generate text with the rhythm, vocabulary, and cadence of Banso-inflected Biblical language. This is not just a technical achievement — it is a cultural preservation project.
| Level | Hardware | What You Can Train |
|---|---|---|
| Minimum | CPU only (any laptop) | MarkGPT-Nano (2M params) |
| Recommended | GPU with 6GB+ VRAM | MarkGPT-Small (10M params) |
| Ideal | GPU with 16GB+ VRAM | MarkGPT-Base (85M params) |
| Cloud | Google Colab / Kaggle (free) | MarkGPT-Small (free tier) |
None. Truly. If you can read this sentence, you can begin Day 1.
By Day 12, you will have the Python and mathematics you need. By Day 30, you will understand neural networks deeply. By Day 60, you will have trained your own LLM.
I have spent over four decades teaching machine learning, computational linguistics, and AI at institutions including Stanford, Harvard, and MIT. In that time, I have seen one pattern repeat itself endlessly: students who rush to copy-paste code without understanding the foundations always hit a wall. The ones who take the time to understand why backpropagation works, why attention is computed the way it is, why tokenization choices matter — those students become builders, not just users.
This curriculum is designed to produce builders. Every exercise, every derivation, every project is chosen to deepen intuition, not just skill. The Banso MarkGPT capstone is not a gimmick — it is a reminder that language models carry within them the languages, cultures, and voices of real people. Build with care.
We warmly welcome contributions from the community! Whether you're a beginner student, experienced educator, or researcher, there are many ways to improve this curriculum.
- Improve Lessons: Clarify explanations, add visualizations, fix typos
- Create Exercises: Build challenging practice problems with solutions
- Build Projects: Create real-world applications and mini-capstones
- Expand Documentation: Write guides, troubleshooting docs, tips
- Language & Culture: Expand the Banso dataset, improve translations
- Bug Fixes: Report and fix errors in code or explanations
- Data Contributions: Help with the Banso vernacular dataset
- Deployment Guides: Create deployment examples and best practices
This is the perfect project to build your portfolio! New to open source? Start here:
- Read contributors/CONTRIBUTORS_GUIDE.md — Complete step-by-step guide for beginners
- Create your contributor folder with your GitHub username in
contributors/ - Work on improvements in your own isolated folder
- Submit a Pull Request when you're ready
- Collaborate with maintainers and other students
The contributors guide includes:
- ✅ Complete setup instructions (no prior Git knowledge required)
- ✅ Step-by-step contribution workflow
- ✅ Best practices and common pitfalls to avoid
- ✅ Clear examples of what makes a good contribution
- ✅ Troubleshooting help for common issues
- Adapt this curriculum for classroom use
- Contribute research insights and papers
- Help with Banso language/cultural accuracy
- Share datasets and resources for African language NLP
- Collaborate on model deployment and optimization
Quick Reference (detailed guide in contributors/CONTRIBUTORS_GUIDE.md):
# 1. Fork the repository on GitHub
# 2. Clone your fork
git clone https://github.com/YOUR-USERNAME/MarkGPT-LLM-Curriculum.git
# 3. Create your contributor workspace
mkdir contributors/YOUR-GITHUB-USERNAME
cd contributors/YOUR-GITHUB-USERNAME
# 4. Create the module structure you're working on
mkdir -p module-02/lessons
# 5. Make your changes and commit
git add .
git commit -m "feat: improve module-02 numpy lesson"
# 6. Push to your fork and create a Pull Request
git push origin main- Start Small: First contributions should be focused (1-2 files)
- Test Everything: Run all Jupyter notebooks before submitting
- Follow Patterns: Look at existing lessons for structure and style
- Write Clear Commits: Describe what changed and why
- Communicate: Ask questions if anything is unclear
- Respect Culture: Be sensitive to Banso language and traditions
- Setup Issues? Check GETTING_STARTED.md
- Contribution Questions? See contributors/CONTRIBUTORS_GUIDE.md
- Git Problems? Check the Git Troubleshooting section
- Need More Help? Open an issue or email
iwstechnical@gmail.com
We are currently seeking contributors for:
- Banso Language Experts: Help expand vernacular dataset and improve linguistic accuracy
- Data Collection: Build tools and workflows for dataset creation
- Model Evaluation: Create evaluation metrics and benchmarks
- Deployment: Build Docker containers and deployment guides
- Documentation: Write case studies, tutorials, and explanations
- Translations: Adapt lessons for other languages/contexts
For detailed guidelines, including code standards, docstring requirements, and the full contribution workflow, see:
- CONTRIBUTING.md — Technical contribution standards
- contributors/CONTRIBUTORS_GUIDE.md — Step-by-step guide for beginners
- COMMUNITY_GUIDELINES.md — Code of conduct and community values
All contributors are recognized in:
- This README (in a Contributors section)
- The CONTRIBUTORS.md file in the repository
- Each Pull Request and commit history
- Our community page
This curriculum is released under the Creative Commons Attribution 4.0 International License.
The MarkGPT source code is under the MIT License.
See LICENSE.md for details.
"Every language is a universe. MarkGPT is our attempt to let one more universe speak."
