Skip to content

Munishx01/hr-attrition-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

👥 HR Attrition Analytics — People & Workforce Insights

Python Scikit-learn Pandas Matplotlib Seaborn License

An end-to-end People Analytics platform that analyses an IBM-style HR dataset of 14,000 employee records to identify attrition patterns, predict high-risk employees using ML, and deliver actionable workforce intelligence for HR leadership.


1. 📌 Project Title / Headline

👥 HR Attrition Intelligence: Predicting Who Leaves — and Why

A comprehensive, data-driven workforce analytics dashboard built to help HR teams understand employee attrition across 7 departments, 23 job roles, salary bands, and tenure cohorts — powered by Logistic Regression and Decision Tree classification with 87% recall on the attrition-positive class.


2. 📋 Short Description / Purpose

The HR Attrition Analytics Dashboard is a visually rich, analytically deep Python-based platform designed to help HR leaders, People Analytics teams, and business managers explore, understand, and predict voluntary employee attrition.

This tool is intended for use by HR Directors, People Analytics teams, CHROs, workforce planners, and data-driven managers who seek to reduce involuntary turnover, identify at-risk employees before they resign, and build evidence-based retention strategies.


3. 🛠️ Tech Stack

The dashboard was built using the following tools and technologies:

  • 🐍 Python 3.10+ — Core programming language
  • 📊 Matplotlib & Seaborn — Professional dark-themed HR dashboard suite
  • 🧮 Pandas & NumPy — Data wrangling, cohort analysis, and aggregations
  • 🤖 Scikit-learn — Logistic Regression, Decision Tree, Random Forest, Gradient Boosting
  • 📉 ROC-AUC & Precision-Recall — Model evaluation with threshold optimization
  • 📁 CSV — Structured HR dataset (IBM-style, 14,000 records)
  • 🖥️ Jupyter Notebook — Exploratory analysis environment

4. 📂 Data Source

Dataset: IBM-style HR Analytics dataset — synthetically engineered with real-world distributions

The dataset captures 14,000 employee records with features covering:

  • Demographics: Age, Gender, Marital Status, Education Field
  • Job Profile: Department, Role, Salary Band, Stock Options, Overtime
  • Experience: Years at Company, Years in Role, Years Since Promotion
  • Satisfaction Scores: Job, Environment, Relationship, Work-Life Balance
  • Attrition Label: Yes/No (target variable)

💡 For the original IBM HR Analytics dataset: Kaggle — IBM HR Analytics


5. 🔍 Features / Highlights

🏢 Business Problem

Employee attrition costs organisations 6–9 months of an employee's salary in replacement costs. For a company of 14,000 people with a 14.8% attrition rate, this translates to ₹280+ crore in annual hidden costs.

Key questions that HR teams struggle to answer with raw data:

  • Which departments and roles are bleeding talent the fastest?
  • Does overtime actually drive people to quit — and by how much?
  • Which employees are 3× more likely to leave in the next 6 months?
  • Does salary alone explain attrition, or is it a multi-factor problem?
  • Can we predict attrition before an employee mentally checks out?

🎯 Goal of the Dashboard

To deliver an interactive People Analytics intelligence platform that:

  • Enables HR leaders to explore attrition patterns across all workforce segments
  • Predicts high-risk employees using ML with 87% recall — catching the maximum number of true attrition cases
  • Surfaces the Triple Risk Factor finding (Low Satisfaction + Overtime + No Promotion = 54% of voluntary exits)
  • Supports monthly HR workforce reviews with a live KPI scorecard

🖥️ Walkthrough of Key Visuals

📊 Executive Workforce KPIs

  • Total Employees: 14,000
  • Overall Attrition Rate: 14.8%
  • Overtime Workers: 28.5%
  • Average Monthly Income: ₹49,775

🏢 Attrition by Department (Horizontal Bar) Ranks all 7 departments by attrition rate. Sales shows the highest exit rate. R&D, despite being the largest department, maintains relatively lower attrition — signaling that role clarity and investment in research culture matters.

💼 Attrition by Job Role (Bar Chart) Top 10 roles ranked by attrition rate. Sales Representatives and Lab Technicians emerge as highest-risk roles — both characterised by high workload, limited career ladder visibility, and lower pay bands.

💰 Salary Band vs Attrition (Dual-Axis Bar) Clear inverse relationship: employees earning below ₹20K/month show 3× the attrition rate of those earning ₹75K+. Overlaid employee count reveals the salary-concentration risk — most employees sit in mid-bands.

⏰ Overtime Effect by Department (Grouped Bar) Side-by-side comparison showing attrition rate WITH vs WITHOUT overtime for each department. Overtime increases attrition by an average of 12–18 percentage points across all departments.

📅 Tenure Cohort Analysis (Bar Chart) Attrition is highest in the 0–1 year cohort (onboarding failure) and again at 3–5 years (career plateau). Employees who survive past 10 years show dramatically lower exit rates — loyalty compounds with tenure.

🎂 Age Group Trend (Line Chart) Younger employees (18–30) show highest attrition — driven by career exploration and salary dissatisfaction. Attrition stabilises from age 35+ as employees settle into roles and financial commitments increase.

🚨 Triple Risk Factor: The 54% Finding Employees with ALL THREE risk signals — Low Job Satisfaction (1–2) + Overtime + No Promotion in 3+ years — account for 54% of ALL voluntary exits, despite representing only ~12% of the workforce. This cohort is the #1 retention intervention target.

📈 ROC Curves — All 4 Models Logistic Regression, Decision Tree, Random Forest, and Gradient Boosting plotted head-to-head. With threshold optimisation at 0.35, the model achieves 87% recall — ensuring the maximum number of at-risk employees are flagged before they resign.

🔑 Feature Importance (Random Forest) Top 12 drivers of attrition ranked by importance. MonthlyIncome, OverTime, YearsAtCompany, JobSatisfaction, and YearsSinceLastPromotion consistently rank as the most predictive features.

🗺️ Attrition Heatmap: Dept × Age Group Full matrix of attrition rates across department and age band combinations — instantly identifying the highest-risk intersections for targeted retention programmes.

📋 HR KPI Scorecard Table A complete monthly review scorecard covering all key metrics, risk summaries, and ML performance indicators — formatted for HR leadership reporting.


💼 Business Impact & Insights

Insight Business Action
Sales dept has 3× higher attrition than R&D Review Sales compensation & career path
Overtime doubles attrition risk in 6 of 7 depts Cap mandatory overtime, offer flex-time
₹20K salary band has highest exit rate Immediate pay band review for bottom tier
0–1 year cohort: highest churn Redesign onboarding & 90-day check-in program
54% of exits = triple-risk group Targeted intervention: promotion + workload relief
87% recall ML model catches most at-risk employees Monthly HR scoring run on all employees
Single employees leave 2× more than married Engagement programmes for younger workforce
Promotion gap >4 years = attrition spike Enforce annual promotion review cycles

6. 📸 Dashboard Screenshots

👥 Executive Workforce Overview

Executive Overview

🔍 Attrition Risk Factors Deep Dive

Risk Factors

🤖 ML Model Performance Dashboard

ML Performance

🗺️ Workforce Demographics & Heatmap

Demographics

📊 HR KPI Scorecard

KPI Scorecard


🚀 Quick Start

git clone https://github.com/Munishx01/hr-attrition-analytics.git
cd hr-attrition-analytics
pip install -r requirements.txt

python src/generate_data.py       # Generate 14K employee dataset
python src/eda_ml_pipeline.py    # Run EDA + train ML models + generate dashboards

🗂️ Project Structure

hr-attrition-analytics/
├── 📁 data/
│   └── hr_attrition.csv             # 14,000 employee records, 27 features
├── 📁 src/
│   ├── generate_data.py             # IBM-style HR data generator
│   └── eda_ml_pipeline.py          # Full EDA + 4 ML models + 5 dashboards
├── 📁 outputs/
│   ├── 📁 figures/
│   │   ├── 01_executive_workforce_overview.png
│   │   ├── 02_attrition_risk_factors.png
│   │   ├── 03_ml_model_performance.png
│   │   ├── 04_workforce_demographics.png
│   │   └── 05_hr_kpi_scorecard.png
│   └── 📁 models/
│       ├── best_model.pkl
│       └── scaler.pkl
├── requirements.txt
└── README.md

👤 Author

Munish Kumar — Data Analyst | Python | SQL | Machine Learning
📧 mk611453@gmail.com | 📍 Palampur, Himachal Pradesh
LinkedIn GitHub


"People are your greatest asset — data helps you protect them." 👥

About

HR People Analytics — 14K IBM-style employee records, attrition prediction with 87% recall, 5 workforce dashboards covering salary bands, tenure cohorts, overtime impact | Python | ML

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages