Developed by: Syed Mushahid Ali Kazmi Supported by: Muhammad Abdullah Repository: TaleemAI on GitHub License: MIT Version: 1.0 Language Focus: Urdu / Regional Languages (Accessibility Module) Platform: Google Colab + GitHub Actions (Automated ML Workflow)
TaleemAI is a personalized adaptive learning and teacher-support system designed to enhance education through data-driven insights, accessibility, and automation.
It leverages Machine Learning (ML) and Natural Language Processing (NLP) to:
- Analyze student performance
- Predict outcomes
- Support teachers with actionable insights
- Provide adaptive learning in Urdu and regional languages
Modules include classification, forecasting, and language translation, combined with automated reporting and GitHub CI/CD integration.
| # | Objective | Description |
|---|---|---|
| 1 | Personalized Learning | Deliver customized insights based on student performance. |
| 2 | Teacher Assistance | Provide data-driven suggestions to educators. |
| 3 | Accessibility | Translate educational content into Urdu and local languages. |
| 4 | Automation | Automatically train, evaluate, and update models using GitHub Actions. |
| 5 | Reporting | Generate professional PDF reports and visual insights. |
Many educational systems lack:
- Personalized student analytics
- Teacher-oriented data insights
- Regional language support
TaleemAI addresses these challenges by:
- Predicting student performance trends
- Providing actionable teacher analytics
- Offering Urdu and regional language translations
- Automating the end-to-end pipeline
Dataset Source: Kaggle - Students Performance in Exams
Description: Demographic info, parental education, test preparation, and exam scores (Math, Reading, Writing).
| Feature | Type | Description |
|---|---|---|
| Gender | Categorical | Male / Female |
| Race/Ethnicity | Categorical | Student's ethnic group |
| Parental Education Level | Categorical | Highest education of parent |
| Lunch | Categorical | Free / Standard |
| Test Preparation Course | Categorical | Completed / Not Completed |
| Math Score | Numeric | Math performance |
| Reading Score | Numeric | Reading performance |
| Writing Score | Numeric | Writing performance |
TaleemAI Pipeline: Five stages
- Load dataset via Kaggle API or CSV upload
- Clean missing values and normalize numeric data
- Encode categorical features (
LabelEncoder/OneHotEncoder)
- Target: Pass/Fail based on average score
- Algorithms: Logistic Regression, Random Forest Classifier, Naive Bayes (NLP tasks)
- Metrics: Accuracy, Precision, Recall, F1-Score, Confusion Matrix
- Goal: Predict future scores
- Algorithm: Linear Regression
- Output:
forecast_results.csv
- Tools:
TextBlob,Googletrans - Notebook:
/notebooks/translation_module.ipynb - Generates teacher-friendly Urdu summaries
- PDF reports via ReportLab
- Graphs via Matplotlib & Seaborn (Class Distribution, Confusion Matrix, Accuracy Trend)
- Auto commit & push to GitHub
- GitHub Actions for scheduled retraining & report updates
flowchart TD
A[Dataset from Kaggle] --> B[Preprocessing]
B --> C[ML Model Training]
C --> D[Evaluation & Metrics]
D --> E[Visualization + Report Generation]
E --> F[PDF Report Creation]
F --> G[Automatic Push to GitHub]
G --> H[GitHub Actions Retraining Workflow]
| Component | Description | Output |
|---|---|---|
| data/ | Raw + processed student dataset | student_performance.csv |
| notebooks/ | Core model notebooks | 5 Notebooks (ML, Forecast, NLP, Translation) |
| models/ | Serialized trained models | .pkl files |
| reports/ | Auto-generated PDF & graphs | /reports/final_report.pdf |
| results/ | CSVs with evaluation metrics | Forecasting & Accuracy results |
| screenshots/ | Saved output images | PNGs of runs |
| README.md | Project overview | Setup & description |
| project_report.md | Full report | This document |
| Metric | Logistic Regression | Random Forest | Naive Bayes |
|---|---|---|---|
| Accuracy | 0.86 | 0.92 | 0.84 |
| Precision | 0.85 | 0.93 | 0.82 |
| Recall | 0.84 | 0.91 | 0.80 |
| F1-Score | 0.84 | 0.92 | 0.81 |
✅ Random Forest achieved the best performance.
| Metric | Value |
|---|---|
| Mean Absolute Error (MAE) | 3.82 |
| R² Score | 0.88 |
| Predicted Score Range | 50–98 |
Forecast model predicts future trends accurately.
| Input (English) | Output (Urdu) |
|---|---|
| “Student needs improvement in math.” | "طالب علم کو ریاضی میں بہتری کی ضرورت ہے۔" |
| Chart | Description |
|---|---|
| Class Distribution | Number of students per performance level |
| Confusion Matrix | Model strengths and weaknesses |
| Forecast Graph | Predicted vs actual performance trends |
- GitHub Actions triggered on push or scheduled workflow
- Installs dependencies
- Downloads dataset via Kaggle API
- Retrains models automatically
- Generates updated graphs & PDF reports
- Commits & pushes back to GitHub
| Impact Area | Description |
|---|---|
| Accessibility | Learning support in Urdu & regional languages |
| Data Empowerment | Helps educators track student performance |
| Scalability | Open-source, extendable for larger platforms |
| Inclusivity | Supports diverse language and cultural backgrounds |
- Dataset scope limited (no attendance, socioeconomic, behavioral data)
- Translation accuracy depends on API/network
- Forecasting assumes stable learning trends
- Integrate with real-time LMS systems
- Use deep learning (LSTM, BERT) for prediction improvements
- Add speech-to-text Urdu learning module
- Enhance dashboard for educators
- Deploy as a Progressive Web App (PWA)
- Kaggle Dataset: Students Performance in Exams — spscientist
- Scikit-Learn Docs: https://scikit-learn.org/
- ReportLab: https://www.reportlab.com/
- Googletrans API: https://pypi.org/project/googletrans/
- Python Docs: https://docs.python.org/
- Main Developer: Mushahid Ali Kazmi
- Technical Support: Muhammad Abdullah
- Affiliation: Independent AI Research Initiative, Pakistan
TaleemAI demonstrates how AI and automation can transform education by:
- Empowering students with personalized analytics
- Supporting teachers with actionable insights
- Bridging language gaps via Urdu/regional translations
- Maintaining ethical, open-source, and plagiarism-free development
“Education is not just learning facts, but training the mind to think — TaleemAI trains both minds and machines.”