This repository contains my work and notebooks from the Kaggle Intro to Machine Learning course.
It covers the fundamental concepts of building, evaluating, and improving machine learning models.
- Clone the repository:
git clone https://github.com/anjanakri/Intro-to-ML.git
Open the notebooks using Jupyter Notebook or Kaggle Notebooks.
🛠 Requirements
-Python 3.x -Pandas -scikit-learn -Jupyter Notebook (or run on Kaggle)
Install dependencies:
pip install pandas scikit-learn notebook
📌 About the Course
The Kaggle Intro to Machine Learning course is designed for beginners to understand: -How to handle data. -How to train and validate models. -How to make predictions and improve performance.
-
Explore Your Data
- Viewing the first few rows of data.
- Summary statistics with .describe().
- Identifying data types and spotting potential issues.
-
Creating, Reading, and Writing
- Loading data into Pandas DataFrames.
- Reading CSV files and writing processed data back to disk.
- Handling file paths and exploring datasets.
-
Your First Machine Learning Model
- Building a simple decision tree model.
- Training the model and making predictions.
- Measuring model accuracy.
-
Model Validation
- Splitting data into training and validation sets.
- Understanding validation scores.
- Avoiding data leakage.
-
Underfitting and Overfitting
- Recognizing signs of underfitting and overfitting.
- Using max_leaf_nodes to control model complexity.
- Striking the balance for better generalization.
-
Random Forest
- Introduction to the Random Forest algorithm.
- Comparing Random Forest with Decision Trees.
- Observing improvements in accuracy.
-
Final Competition Submission (Not in this repo)
- This notebook contains my final submission for the course’s competition.
- You can view it directly on my Kaggle profile: My Kaggle Profile.
