Car Price Prediction with Machine Learning

Project Overview

This project uses machine learning algorithms to predict car prices based on various features such as engine specifications, fuel type, body style, and more. The model is trained on a dataset containing detailed information about different car models and their corresponding prices.

Dataset

Source: Kaggle - Car Price Prediction Dataset

The dataset contains information about cars with the following key features:

Engine specifications: Engine size, horsepower, fuel system
Physical attributes: Length, width, height, curb weight
Performance metrics: City MPG, highway MPG, compression ratio
Categories: Car company, fuel type, aspiration, body style, drive wheels
Target variable: Price (what we're predicting)

Dataset Files

CarPrice_Assignment.csv - Main dataset with car features and prices
Data Dictionary - carprices.xlsx - Detailed description of all features

Technologies Used

Python 3.x
pandas - Data manipulation and analysis
numpy - Numerical computing
scikit-learn - Machine learning algorithms and tools
matplotlib - Data visualization
Custom transformers - Log transformation for specific features

Project Structure

PredictCarPrice/
│
├── dataset/
│   ├── CarPrice_Assignment.csv       # Main dataset
│   └── Data Dictionary - carprices.xlsx  # Feature descriptions
│
├── main.py                          # Main script with model training
├── loadData.py                      # Data loading and splitting utilities
├── dataCleanup.py                   # Data preprocessing and feature engineering
├── custom_transformers.py           # Custom sklearn transformers
├── .gitignore                       # Git ignore file
└── README.md                        # This file

Features & Data Processing

Feature Engineering

Log transformation applied to horsepower and enginesize for better distribution
Robust scaling for numerical features to handle outliers
One-hot encoding for categorical variables
Custom pipeline for seamless data preprocessing

Data Preprocessing Pipeline

LogTransformer: Applies log transformation to specified numerical features
RobustScaler: Scales numerical features while being robust to outliers
OneHotEncoder: Converts categorical variables to numerical format

Machine Learning Models

The project evaluates multiple regression algorithms:

Linear Regression - Baseline model
Decision Tree Regressor - Non-linear relationships
Random Forest Regressor - Ensemble method (final choice)

Model Selection Process

Cross-validation with 10 folds for robust evaluation
Grid Search for hyperparameter tuning
RMSE (Root Mean Square Error) as the primary evaluation metric

Hyperparameter Tuning

The Random Forest model is optimized using GridSearchCV with the following parameters:

n_estimators: [3, 10, 30]
max_features: [2, 4, 6, 8]
bootstrap: [True, False]

How to Run

Clone the repository:

git clone https://github.com/ArnavGRao/PredictCarPriceWithRegression.git
cd PredictCarPriceWithRegression

Install required packages:

pip install -r requirements.txt

Or install manually:

pip install pandas numpy scikit-learn matplotlib scipy seaborn

Run the main script:
```
python main.py
```

Key Functions

`main.py`

Main execution script
Model training, evaluation, and comparison
Hyperparameter tuning with GridSearchCV
Final model evaluation and feature importance analysis

`loadData.py`

Data loading utilities
Train-test split functionality

`dataCleanup.py`

Data preprocessing functions
Feature engineering pipeline

`custom_transformers.py`

LogTransformer: Custom sklearn transformer for log transformation

Model Performance

The project includes comprehensive model evaluation:

Cross-validation scores for model reliability
Grid search results for optimal hyperparameters
Feature importance rankings to understand which car attributes drive price predictions
Test set evaluation for final model performance assessment

Key Learnings

End-to-end ML pipeline from data loading to model evaluation
Proper data preprocessing techniques for mixed data types
Model comparison and selection methodology
Hyperparameter optimization using grid search
Custom transformer creation for specialized preprocessing needs

Author

ArnavGRao

GitHub: @ArnavGRao
Email: arnavgrao@gmail.com

License

This project is open source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
dataset		dataset
results		results
.gitignore		.gitignore
LICENSE		LICENSE
PORTFOLIO.md		PORTFOLIO.md
README.md		README.md
SETUP.md		SETUP.md
activate_env.bat		activate_env.bat
activate_env.sh		activate_env.sh
custom_transformers.py		custom_transformers.py
dataCleanup.py		dataCleanup.py
loadData.py		loadData.py
main.py		main.py
requirements-simple.txt		requirements-simple.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Car Price Prediction with Machine Learning

Project Overview

Dataset

Dataset Files

Technologies Used

Project Structure

Features & Data Processing

Feature Engineering

Data Preprocessing Pipeline

Machine Learning Models

Model Selection Process

Hyperparameter Tuning

How to Run

Key Functions

`main.py`

`loadData.py`

`dataCleanup.py`

`custom_transformers.py`

Model Performance

Key Learnings

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Car Price Prediction with Machine Learning

Project Overview

Dataset

Dataset Files

Technologies Used

Project Structure

Features & Data Processing

Feature Engineering

Data Preprocessing Pipeline

Machine Learning Models

Model Selection Process

Hyperparameter Tuning

How to Run

Key Functions

main.py

loadData.py

dataCleanup.py

custom_transformers.py

Model Performance

Key Learnings

Author

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`main.py`

`loadData.py`

`dataCleanup.py`

`custom_transformers.py`

Packages