🧠 Stemeta Internship — ML Projects Portfolio

🏢 Projects developed during AI/ML Internship at Stemeta.ai, Islamabad Covering real-world classification problems across healthcare, finance, telecom, and HR domains — with full EDA, feature engineering, model training, evaluation, and API deployment.

📁 Projects Overview

#	Project	Domain	Algorithm	Key Technique
1	🔬 Breast Cancer Detection	Healthcare	Random Forest, SVM	Binary Classification
2	💳 Credit Card Fraud Detection	Finance	Random Forest, Logistic Regression	SMOTE — Imbalanced Data
3	💡 Health Indicator Analysis	Healthcare	ML Classification	EDA + Patient Profiling
4	❤️ Heart Failure Prediction	Healthcare	Logistic Regression, XGBoost	Precision-Recall Optimization
5	🧑‍💼 IBM HR Employee Attrition	HR Analytics	Decision Tree, Random Forest	Attrition Factor Analysis
6	📧 Spam Email Detection	NLP	Multinomial Naive Bayes	TF-IDF Text Vectorization
7	📱 Telco Customer Churn	Telecom	Logistic Regression, Decision Tree	Churn Prediction
8	🚢 Titanic Survival API	Classic ML	Random Forest	Flask REST API Deployment
9	⚖️ Imbalanced Data — Churn	Telecom	Multiple Models	SMOTE, Under/Over Sampling

🔍 Project Details

1. 🔬 Breast Cancer Detection

Supervised classification model predicting whether a tumor is benign or malignant based on diagnostic imaging features.

Dataset: Wisconsin Breast Cancer Dataset
Models: Random Forest, SVM, Logistic Regression
Focus: High recall to minimize false negatives (missed cancers)

2. 💳 Credit Card Fraud Detection

Detecting fraudulent transactions in a highly imbalanced dataset where fraud cases are less than 0.2% of all transactions.

Challenge: Extreme class imbalance
Solution: SMOTE oversampling + undersampling techniques
Models: Random Forest, Logistic Regression
Metrics: Precision, Recall, F1, ROC-AUC

3. 💡 Health Indicator Analysis

Exploratory analysis of general health data to identify key disease indicators and support healthcare decision-making.

Focus: EDA, feature correlation, patient profiling
Techniques: Univariate & multivariate analysis, visualization

4. ❤️ Heart Failure Prediction

Predicting likelihood of heart failure from clinical records — where precision and recall are critical for life-saving decisions.

Dataset: Heart Failure Clinical Records Dataset
Models: XGBoost, Logistic Regression, Random Forest
Focus: Optimizing recall — missing a positive case is costly

5. 🧑‍💼 IBM HR Analytics — Employee Attrition

Predicting which employees are likely to leave the organization and identifying the key factors driving attrition.

Dataset: IBM HR Analytics Dataset (1,470 employees)
Models: Decision Tree, Random Forest
Insights: Feature importance analysis for HR decision support

6. 📧 Spam Email Detection

NLP-based binary text classification system to detect spam emails with high accuracy using classical NLP techniques.

Technique: TF-IDF Vectorization
Model: Multinomial Naive Bayes
Libraries: NLTK, Scikit-learn
Focus: Text preprocessing, stop word removal, stemming

7. 📱 Telco Customer Churn Prediction

Predicting customer churn for a telecom company to help retention teams proactively target at-risk customers.

Dataset: IBM Telco Customer Churn Dataset
Models: Logistic Regression, Decision Tree
Features: Demographics, account info, service usage data

8. 🚢 Titanic Survival Prediction API

End-to-end ML project with REST API deployment — predicting passenger survival and serving predictions via Flask.

Model: Random Forest Classifier
Deployment: Flask REST API
Input: Passenger features → Output: Survival prediction (JSON)

# Example API call
POST /predict
{
  "Pclass": 1,
  "Sex": "female",
  "Age": 28,
  "SibSp": 0,
  "Fare": 100
}
# Response: {"survived": 1, "probability": 0.92}

9. ⚖️ Handling Imbalanced Data — Customer Churn

Deep-dive study into resolving class imbalance in the Telco Churn dataset and measuring its impact on model performance.

Techniques Compared:
- SMOTE (Synthetic Minority Oversampling)
- Random Oversampling
- Random Undersampling
- Combined Approach
Finding: SMOTE + Random Forest gave best F1 score

🛠️ Tech Stack

Category	Tools
Language	Python 3.x
ML Library	Scikit-learn
Data Analysis	Pandas, NumPy
Visualization	Matplotlib, Seaborn
NLP	NLTK, TF-IDF
Imbalanced Data	imbalanced-learn (SMOTE)
API Deployment	Flask
Notebooks	Jupyter Notebook
Version Control	Git, GitHub

🚀 Getting Started

# Clone the repo
git clone https://github.com/tashfeen786/STEMETA_Intership_Projects.git
cd STEMETA_Intership_Projects

# Install dependencies
pip install -r requirements.txt

# Open any project notebook
jupyter notebook

🏗️ Project Structure

STEMETA_Intership_Projects/
│
├── breast_cancer/                    # Breast cancer classification
├── Credit Card Fraud Detection/      # Fraud detection + SMOTE
├── HealthIndicator/                  # Health data EDA & modeling
├── Heart_Failure_Prediction/         # Heart failure risk prediction
├── IBM HR Analytics.../              # Employee attrition analysis
├── Spam_email_Detection/             # NLP spam classifier
├── Telco_Customer_Churn/             # Customer churn prediction
├── TitanicAPI/                       # Flask API deployment
├── Handling_imbalanced_data_...ipynb # Imbalanced data study
└── requirements.txt                  # Python dependencies

📈 Key Learnings from Internship

✅ Real-world data is messy — EDA and preprocessing matter most
✅ Class imbalance is a critical challenge in production ML
✅ Model selection depends on domain — healthcare needs high recall
✅ Deployment bridges the gap between ML model and real product
✅ Feature engineering often matters more than model choice

👨‍💻 Author

Tashfeen Aziz — AI/ML Engineer & Python Developer

⭐ If you found these projects helpful, please give it a star!

Built during AI/ML Internship at Stemeta.ai, Islamabad 🇵🇰

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Stemeta Internship — ML Projects Portfolio

📁 Projects Overview

🔍 Project Details

1. 🔬 Breast Cancer Detection

2. 💳 Credit Card Fraud Detection

3. 💡 Health Indicator Analysis

4. ❤️ Heart Failure Prediction

5. 🧑‍💼 IBM HR Analytics — Employee Attrition

6. 📧 Spam Email Detection

7. 📱 Telco Customer Churn Prediction

8. 🚢 Titanic Survival Prediction API

9. ⚖️ Handling Imbalanced Data — Customer Churn

🛠️ Tech Stack

🚀 Getting Started

🏗️ Project Structure

📈 Key Learnings from Internship

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
Credit Card Fraud Detection		Credit Card Fraud Detection
HealthIndicator		HealthIndicator
Heart_Failure_Prediction		Heart_Failure_Prediction
IBM HR Analytics Employee Attrition & Performance		IBM HR Analytics Employee Attrition & Performance
Spam_email_Detection		Spam_email_Detection
Telco_Customer_Churn		Telco_Customer_Churn
TitanicAPI		TitanicAPI
breast_cancer		breast_cancer
.gitattributes		.gitattributes
Handling_imbalanced_data_in_customer_churn_prediction.ipynb		Handling_imbalanced_data_in_customer_churn_prediction.ipynb
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 Stemeta Internship — ML Projects Portfolio

📁 Projects Overview

🔍 Project Details

1. 🔬 Breast Cancer Detection

2. 💳 Credit Card Fraud Detection

3. 💡 Health Indicator Analysis

4. ❤️ Heart Failure Prediction

5. 🧑‍💼 IBM HR Analytics — Employee Attrition

6. 📧 Spam Email Detection

7. 📱 Telco Customer Churn Prediction

8. 🚢 Titanic Survival Prediction API

9. ⚖️ Handling Imbalanced Data — Customer Churn

🛠️ Tech Stack

🚀 Getting Started

🏗️ Project Structure

📈 Key Learnings from Internship

👨‍💻 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages