Skip to content

Tosa-omokhoa/CodeAlpha_SalesPrediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Sales Prediction using Python

CodeAlpha Data Science Internship — Task 4

Intern: Omokhoa Oshose Tosayoname
Intern ID: CA/DF1/71570
Duration: 20th May 2026 – 20th June 2026


Overview

This project predicts product sales based on advertising spend across three media channels: TV, Radio, and Newspaper. Using the classic Advertising dataset (200 observations), we explore how budget allocation across channels influences sales outcomes and build multiple regression models to forecast future sales.

Business Question: How does advertising spend across TV, Radio, and Newspaper channels drive sales, and which channel delivers the highest return?


Project Pipeline

Data Loading --> EDA & Visualisation --> Feature Engineering
    --> Model Training --> Evaluation --> Business Insights

Project Structure

CodeAlpha_SalesPrediction/
├── data/
│   └── Advertising.csv          # Raw dataset
├── notebooks/
│   └── sales_prediction.ipynb   # Main notebook (fully executed)
├── requirements.txt
└── README.md

Dataset

Feature Description
TV Advertising budget spent on TV (in $000s)
Radio Advertising budget spent on Radio (in $000s)
Newspaper Advertising budget spent on Newspaper (in $000s)
Sales Units sold (in thousands) — target variable

Models Trained and Compared

Model Features Used
Linear Regression Base (TV, Radio, Newspaper)
Ridge Regression Base
Lasso Regression Base
Polynomial Regression (degree 2) Base
Random Forest Regressor Base + engineered features
XGBoost Regressor Base + engineered features

Engineered features include: TV×Radio interaction, TV×Newspaper, Radio×Newspaper, Total Budget, TV Share, Radio Share.


Results

Model RMSE MAE
Random Forest 0.9880 0.6148 0.4797
Polynomial Reg (d=2) 0.9869 0.6426 0.5262
XGBoost 0.9846 0.6980 0.5449
Linear Regression 0.8994 1.7816 1.4608
Ridge Regression 0.8988 1.7872 1.4643
Lasso Regression 0.8983 1.7913 1.4613

Best model: Random Forest (R² = 0.9880)


Key Business Findings

  • TV advertising has the strongest correlation with sales (r ~ 0.78).
  • Radio is the second most impactful channel; its interaction with TV is highly predictive.
  • Newspaper spending shows the weakest relationship with sales outcomes.
  • Companies in the highest TV budget quartile achieve nearly 3x the sales of those in the lowest quartile.
  • Recommendation: Prioritise TV and Radio spend for maximum sales lift; reconsider Newspaper allocation.

Visualisations Produced

  • Advertising budget distributions per channel
  • Sales distribution and Q-Q plot
  • Scatter plots: each channel vs sales with regression lines and correlation coefficients
  • Correlation heatmaps (base and engineered features)
  • Budget allocation pie chart and bar chart
  • Box plots for all variables
  • Pairplot of all features
  • Engineered feature correlation matrix
  • Model performance comparison (R², RMSE, MAE)
  • Actual vs Predicted plots for top two models
  • Residual analysis plots
  • Random Forest feature importances
  • OLS regression summary (statsmodels)
  • Linear regression coefficients chart
  • Sales segmentation by TV budget quartile
  • Advertising channel ROI proxy

How to Run

  1. Clone this repository:

    git clone https://github.com/Tosa9/CodeAlpha_SalesPrediction.git
    cd CodeAlpha_SalesPrediction
  2. Install dependencies:

    pip install -r requirements.txt
  3. Launch the notebook:

    jupyter notebook notebooks/sales_prediction.ipynb

Dataset Source

Advertising Dataset — Kaggle


CodeAlpha Data Science Internship | Task 4
#CodeAlpha #DataScience #MachineLearning #SalesPrediction #Python

About

Sales prediction from TV, Radio, and Newspaper advertising spend using Linear, Ridge, Lasso, Polynomial, Random Forest, and XGBoost regression. Best model: Random Forest at R²=0.988. Includes feature engineering, channel ROI analysis, and business insights. CodeAlpha Data Science Internship — Task 4.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors