Skip to content

Almas-ansari/Ev-range-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Electric Vehicle (EV) Range Prediction using Classical and Neural Regression Models

Abstract

This project investigates machine-learning approaches to predict the remaining driving range of electric vehicles using onboard telemetry and contextual features. The work, conducted as part of a research internship at the Department of Management Studies, IIT Roorkee, implements a full pipeline — data collection and cleaning, feature encoding, baseline linear models, tree-based ensemble models, and a feed-forward neural network — and provides reproducible notebooks for model training and evaluation. The codebase includes notebooks for exploratory analysis, preprocessing, model experiments, and visualization.

Problem statement

Predict the remaining driving range (continuous variable, in km) of an electric vehicle given a snapshot of vehicle state and environmental context (battery level, recent driving behavior, ambient conditions, etc.) Accurate short-term range prediction helps reduce driver range anxiety, enables smarter charging strategies, and improves energy management for EV fleets.

Data

Data input Files present: data.csv (raw), data_cleaned.csv (processed), data_enc_dummies.csv and data_enc_label.csv (encoded feature variants). Collection method: a scraping script data_scrapper.py was used and the dataset was programmatically collected/assembled as part of the internship pipeline after required permission and license check for data. Typical features (as used across notebooks): telemetry-like features (battery percentage/state-of-charge, recent distance/speed statistics), environmental/context features (temperature, maybe terrain), and engineered variables produced in the feature extraction notebooks. Exact column names and counts are available in data_cleaned.csv and the feature notebooks. Data snapshots/artifacts: cleaned dataset and two encoded variants (dummy one-hot and label-encoded) are included to support different model types (tree vs linear/NN).

Preprocessing & feature engineering

Implemented steps: Data cleaning (data cleaning.ipynb): missing value handling, basic sanity checks, duplicate removal, and timestamp parsing where applicable. Feature extraction & encoding (Features extraction.ipynb, Feature_encoding.ipynb): derived features from telemetry (e.g., rolling averages, recent speed or distance windows), categorical encoding (one-hot and label encodings saved as separate CSVs), and normalization/standardization where needed for linear/NN models. Exploration (basic_data_exploration.ipynb, Advanced data exploration.ipynb): univariate and bivariate analyses, target distribution checks, and correlation inspection to select candidate features. These steps are reproducible in the notebooks and produce the data_cleaned.csv and encoded variants used by the model notebooks.

Models implemented Each model has an associated notebook that trains and evaluates it on the processed dataset:

Linear Regression (linear_regression.ipynb) Linear Regression Plot Purpose: simple, interpretable baseline to set a performance floor.

Preprocessing: feature scaling / encoding as required.

Tree-based methods (Trees.ipynb)

Implemented algorithms: Decision Tree and ensemble methods (Random Forest, Gradient Boosting-style models).

Strength: handle nonlinear interactions and categorical features without heavy scaling; useful reference for feature importance.

Feed-forward Neural Network (MLP) (Feed forward neural netwrok.ipynb)

A simple multi-layer perceptron to model complex non-linear relations; notebook contains architecture, training loop, and loss curves.

Implemented using the deep-learning framework present in the notebook (see the notebook header for exact framework and hyperparameters).

Notes: Notebooks include hyperparameter choices and training code — run them top-to-bottom to reproduce training and evaluation.

Method

Data split: Notebooks perform a train/test separation (train/test split) to evaluate generalization; cross-validation or hyperparameter tuning is performed where indicated in the corresponding notebooks.

Evaluation metrics: Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) are used as primary metrics for regression performance. Notebooks compute and report these metrics per model.

Reproducibility: The repo contains cleaned datasets and model scripts/notebooks; saving trained artifacts and metrics to a results/ folder is recommended for future runs.

Suggested next steps

Add temporal models (LSTM/Transformer) for sequences of telemetry to capture state transitions.

Implement uncertainty quantification (e.g., quantile regression or Monte Carlo Dropout) to produce confidence bounds — valuable in driver-facing applications.

Evaluate model calibration and produce a lightweight on-device inference pipeline (pruning/distillation) for embedded deployment.

Create an evaluation suite that stresses the model on edge-case scenarios (cold start, steep slopes, heavy loads).

Reproducibility & artifacts

Run the notebooks in order: data cleaning → feature encoding → model training notebooks.

About

This project investigates machine-learning approaches to predict the remaining driving range of electric vehicles using onboard telemetry and contextual features. The work, conducted as part of a research internship at the Department of Management Studies, IIT Roorkee

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors