Skip to content

pelabdang/leads-analysis-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚗 Lead Generation Prediction for Vehicle Listings

Python LightGBM Optuna Sklearn License

🎯 Project Overview

This machine learning project predicts the number of leads generated by vehicle advertisements using advanced feature engineering and gradient boosting algorithms. The solution helps automotive businesses optimise listings, pricing, and marketing strategies based on data-driven predictions.

Main Goal: Predict the number of leads (customer inquiries) that a vehicle listing will generate, based on vehicle characteristics, pricing, location, and advertisement features.


🏆 Model Performance & Results

Key Achievements

Metric Value
📊 RMSE 6.922 leads
📈 R² 70.3% variance explained
⚡ Inference speed < 1ms per listing
📋 Features used 16 (down from 48)
🔼 Baseline improvement 64.6%

Model Specifications

  • Algorithm: LightGBM Regressor with Optuna hyperparameter optimisation (150 trials)
  • Training data: 48,578 vehicle listings
  • Validation: 5-fold cross-validation with overfitting detection
  • Feature engineering: Target encoding, Jenks flag clustering, outlier removal

Business Impact Categories

Lead Range Category Recommended Action
0–5 🔴 Low Performance Review listing quality, adjust pricing
6–15 🟡 Moderate Performance Minor adjustments, standard monitoring
16–30 🟢 High Performance Replicate success factors, scale strategy
31+ 🌟 Exceptional Performance Case study, premium placement

🔍 What Was Accomplished

1. Data Analysis & Cleaning

  • Analysed 48,578 listings across multiple Brazilian states and cities
  • Reduced 48 raw features to 16 optimised features (67% reduction, zero accuracy loss)
  • Systematic outlier removal and missing value handling

2. Advanced Feature Engineering

  • Geographic encoding: City/state target encoding with smoothing (prevents overfitting)
  • Flag clustering: Jenks Natural Breaks to group vehicle feature combinations
  • Price positioning: Market value vs. advertised price gap analysis
  • Visual impact: Photo count optimisation (sweet spot: 8 photos)

3. Model Development

  • Hyperparameter search via Optuna (150 trials)
  • Learning curve analysis for regularisation guidance
  • Sklearn-compatible production pipeline (.joblib)

4. Business Intelligence

  • Top drivers: phone clicks, views, location, price positioning
  • State-specific lead generation patterns identified
  • Feature importance ranking for actionable ad improvement

🚀 Quick Start

git clone https://github.com/pelabdang/leads-analysis-prediction.git
cd leads-analysis-prediction
pip install -r requirements.txt

Batch Prediction

from src.models.model_trainer import ModelTrainer
import pandas as pd

listings_df = pd.read_csv('your_listings.csv')
trainer = ModelTrainer()
predictions = trainer.predict_batch(listings_df)

listings_df['predicted_leads'] = predictions
listings_df['performance_category'] = pd.cut(
    predictions,
    bins=[0, 5, 15, 30, float('inf')],
    labels=['Low', 'Moderate', 'High', 'Exceptional']
)

Real-time API

from src.models.model_trainer import ModelTrainer
from flask import Flask, request, jsonify

app = Flask(__name__)
model = ModelTrainer.load_model('complete_ml_pipeline')

@app.route('/predict', methods=['POST'])
def predict_leads():
    data = request.json
    prediction = model.predict_single(data)
    return jsonify({'predicted_leads': prediction})

📁 Project Structure

leads-analysis-prediction/
├── config/               # config.yaml — model & data settings
├── data/
│   ├── raw/              # Original immutable dataset
│   ├── processed/        # Cleaned, feature-engineered data
│   └── external/         # Supplementary sources
├── models/               # complete_ml_pipeline.joblib + artefacts
├── notebooks/
│   ├── exploratory/
│   ├── feature_engineering/
│   └── modeling/
├── reports/              # Modeling & feature engineering reports
├── src/
│   ├── data/
│   ├── features/
│   ├── models/
│   └── visualization/
├── requirements.txt
└── setup.py

📊 Key Lead Generation Drivers

  1. 📞 Phone Engagement — strongest signal of buyer intent
  2. 👁️ Visual Presentation — 8 photos = optimal lead generation
  3. 🌍 Geographic Location — state/city-level variance is significant
  4. 💰 Price Positioning — sweet spot relative to market value
  5. 🚗 Vehicle Features — safety & comfort clusters drive engagement

📚 Reports


📧 Contact: Angelo Pelisson · GitHub
📊 Performance: RMSE 6.922 · R² 70.3% · 64.6% over baseline

About

Leads predictions analysis on car advertisements

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors