Skip to content

Latest commit

 

History

History
108 lines (78 loc) · 5.73 KB

File metadata and controls

108 lines (78 loc) · 5.73 KB

s# 🤖 Kaggle ML Projects & AI Models

Welcome to my Machine Learning portfolio! I am an AI Tools engineering student and a Python developer. This repository contains my code, notebooks, and models from various Kaggle competitions and Data Science challenges.

🚀 Featured Projects

1. Titanic - Machine Learning from Disaster 🚢

  • Goal: To predict which passengers survived the Titanic shipwreck using passenger data (ie name, age, gender, socio-economic class, etc).
  • Tech Stack: Python, Pandas, Machine Learning Models
  • My Achievement: Successfully trained an AI model that achieved an accuracy score of 0.77511 (77.5%) on the first deployment.

🛠️ Tools & Technologies

  • Python
  • Jupyter Notebooks
  • Data Analysis & Predictive Modeling

2. Video Game Sales Analysis 🎮

  • Goal: To analyze the gaming market and identify the most popular genres using worldwide sales data.
  • Tech Stack: Python, Pandas, Matplotlib
  • Key Insight: Used automated file discovery to handle dynamic datasets and visualized top gaming trends.

3. Spaceship Titanic - Cosmic Mystery 🚀

  • Goal: To predict which passengers were transported to an alternate dimension during the spaceship's collision with a spacetime anomaly.
  • Tech Stack: Python, Pandas, Data Preprocessing
  • My Achievement: (Day 2) Successfully performed advanced data cleaning, handled missing values (NaN) on live competition data, and generated a baseline prediction model achieving an initial score of 0.49310. Setting the stage for Machine Learning deployment!

3. Spaceship Titanic - Cosmic Mystery 🚀

  • Goal: To predict which passengers were transported to an alternate dimension during the spaceship's collision with a spacetime anomaly.
  • Tech Stack: Python, Pandas, Scikit-Learn, Machine Learning (Random Forest Classifier)
  • My Achievement: - Day 2: Performed advanced data cleaning and handled missing values (NaN) on live competition data, establishing a baseline.
  • Day 3: Successfully trained, validated, and deployed a Random Forest Machine Learning model that crushed the baseline, achieving an impressive accuracy score of 0.79448 (79.4%)!

*4 Credit Card Fraud Detection *Developed a Credit Card Fraud Detection model with 92% recall using balanced Logistic Regression

👁️ Digit Recognizer - Computer Vision (Kaggle)

🎯 Project Overview

This project is a Computer Vision classification model built to correctly identify hand-written digits (0-9) from a dataset of tens of thousands of scanned images. Instead of using standard image files, the model processes 784 pixel values (28x28 images) per digit to recognize patterns.

🏆 Model Performance

  • Accuracy Score: 96.57%
  • Algorithm: Random Forest Classifier (n_estimators=100)
  • Platform: Kaggle

🛠️ Tech Stack & Tools

  • Language: Python
  • Libraries: Pandas, Scikit-Learn
  • Environment: Kaggle Notebooks / Jupyter

🚀 Key Learnings

  • Handling large datasets and pixel-level data extraction.
  • Training an ensemble Machine Learning model (Random Forest) for image classification.
  • Linking Kaggle directly to GitHub for automated version control.

*Docs: Added comprehensive NLP Kaggle Competition (Score: 0.796)

🎮 Video Game Sales & Market Analysis (VS Gaming Studio)

🎯 Project Overview

A comprehensive data analysis of global video game sales to uncover market trends, popular genres, and dominant publishers. This market research serves as a foundational study for upcoming game development projects at VS Gaming Studio, helping the team understand player demands.

📊 Key Insights

  • Top Genres: Action and Sports games have historically seen the highest number of releases.
  • Top Publishers: Nintendo and Electronic Arts (EA) maintain a massive lead in global sales.
  • Data Value: Understanding platform adoption and genre popularity helps indie developers make data-driven decisions.

🛠️ Tech Stack & Tools

  • Language: Python
  • Libraries: Pandas, Matplotlib, Seaborn
  • Environment: Kaggle Notebooks

📊 Day 9: Python Data Analysis Basics & Visualization

📝 Overview

This repository/notebook contains my Day 9 progress of the '15 Days of Python Basics' learning track. Today, I shifted focus to Data Science and Analytics using Kaggle. The project demonstrates how to create, manipulate, and visualize datasets using Python's most powerful data libraries.

As a practical example, I analyzed a custom player dataset for VS Gaming Studio, filtering pro players and visualizing their scores using a dark-themed cyberpunk bar chart.

🛠️ Tech Stack Used

  • Language: Python 3
  • Libraries: Pandas (Data Manipulation), Matplotlib (Data Visualization)
  • Environment: Kaggle Notebooks

🧠 Concepts Mastered Today

  1. Dictionary to DataFrame Conversion: Creating structured tabular data (pd.DataFrame).
  2. Data Inspection: Using .head() and .describe() to get a quick statistical summary of the data.
  3. Data Filtering: Extracting specific rows based on conditions (e.g., filtering players with a score > 1000).
  4. Data Visualization: Plotting customized bar charts using Matplotlib with custom themes (dark_background), labels, and titles.

🚀 How to Run (Kaggle)

  1. Open Kaggle.
  2. Create a New Notebook.
  3. Copy the code cells provided in this project.
  4. Hit Shift + Enter on each cell to see the data tables and graphs render in real-time.

Created an Object-Oriented ML Pipeline for Spaceship Titanic


🔥 "Consistency is the ultimate hack." - Keeping the daily learning streak alive!


Developed as part of my daily AI & Machine Learning practice.


Quietly working away and building the future with AI.