An end-to-end People Analytics platform that analyses an IBM-style HR dataset of 14,000 employee records to identify attrition patterns, predict high-risk employees using ML, and deliver actionable workforce intelligence for HR leadership.
👥 HR Attrition Intelligence: Predicting Who Leaves — and Why
A comprehensive, data-driven workforce analytics dashboard built to help HR teams understand employee attrition across 7 departments, 23 job roles, salary bands, and tenure cohorts — powered by Logistic Regression and Decision Tree classification with 87% recall on the attrition-positive class.
The HR Attrition Analytics Dashboard is a visually rich, analytically deep Python-based platform designed to help HR leaders, People Analytics teams, and business managers explore, understand, and predict voluntary employee attrition.
This tool is intended for use by HR Directors, People Analytics teams, CHROs, workforce planners, and data-driven managers who seek to reduce involuntary turnover, identify at-risk employees before they resign, and build evidence-based retention strategies.
The dashboard was built using the following tools and technologies:
- 🐍 Python 3.10+ — Core programming language
- 📊 Matplotlib & Seaborn — Professional dark-themed HR dashboard suite
- 🧮 Pandas & NumPy — Data wrangling, cohort analysis, and aggregations
- 🤖 Scikit-learn — Logistic Regression, Decision Tree, Random Forest, Gradient Boosting
- 📉 ROC-AUC & Precision-Recall — Model evaluation with threshold optimization
- 📁 CSV — Structured HR dataset (IBM-style, 14,000 records)
- 🖥️ Jupyter Notebook — Exploratory analysis environment
Dataset: IBM-style HR Analytics dataset — synthetically engineered with real-world distributions
The dataset captures 14,000 employee records with features covering:
- Demographics: Age, Gender, Marital Status, Education Field
- Job Profile: Department, Role, Salary Band, Stock Options, Overtime
- Experience: Years at Company, Years in Role, Years Since Promotion
- Satisfaction Scores: Job, Environment, Relationship, Work-Life Balance
- Attrition Label: Yes/No (target variable)
💡 For the original IBM HR Analytics dataset: Kaggle — IBM HR Analytics
Employee attrition costs organisations 6–9 months of an employee's salary in replacement costs. For a company of 14,000 people with a 14.8% attrition rate, this translates to ₹280+ crore in annual hidden costs.
Key questions that HR teams struggle to answer with raw data:
- Which departments and roles are bleeding talent the fastest?
- Does overtime actually drive people to quit — and by how much?
- Which employees are 3× more likely to leave in the next 6 months?
- Does salary alone explain attrition, or is it a multi-factor problem?
- Can we predict attrition before an employee mentally checks out?
To deliver an interactive People Analytics intelligence platform that:
- Enables HR leaders to explore attrition patterns across all workforce segments
- Predicts high-risk employees using ML with 87% recall — catching the maximum number of true attrition cases
- Surfaces the Triple Risk Factor finding (Low Satisfaction + Overtime + No Promotion = 54% of voluntary exits)
- Supports monthly HR workforce reviews with a live KPI scorecard
📊 Executive Workforce KPIs
- Total Employees: 14,000
- Overall Attrition Rate: 14.8%
- Overtime Workers: 28.5%
- Average Monthly Income: ₹49,775
🏢 Attrition by Department (Horizontal Bar) Ranks all 7 departments by attrition rate. Sales shows the highest exit rate. R&D, despite being the largest department, maintains relatively lower attrition — signaling that role clarity and investment in research culture matters.
💼 Attrition by Job Role (Bar Chart) Top 10 roles ranked by attrition rate. Sales Representatives and Lab Technicians emerge as highest-risk roles — both characterised by high workload, limited career ladder visibility, and lower pay bands.
💰 Salary Band vs Attrition (Dual-Axis Bar) Clear inverse relationship: employees earning below ₹20K/month show 3× the attrition rate of those earning ₹75K+. Overlaid employee count reveals the salary-concentration risk — most employees sit in mid-bands.
⏰ Overtime Effect by Department (Grouped Bar) Side-by-side comparison showing attrition rate WITH vs WITHOUT overtime for each department. Overtime increases attrition by an average of 12–18 percentage points across all departments.
📅 Tenure Cohort Analysis (Bar Chart) Attrition is highest in the 0–1 year cohort (onboarding failure) and again at 3–5 years (career plateau). Employees who survive past 10 years show dramatically lower exit rates — loyalty compounds with tenure.
🎂 Age Group Trend (Line Chart) Younger employees (18–30) show highest attrition — driven by career exploration and salary dissatisfaction. Attrition stabilises from age 35+ as employees settle into roles and financial commitments increase.
🚨 Triple Risk Factor: The 54% Finding Employees with ALL THREE risk signals — Low Job Satisfaction (1–2) + Overtime + No Promotion in 3+ years — account for 54% of ALL voluntary exits, despite representing only ~12% of the workforce. This cohort is the #1 retention intervention target.
📈 ROC Curves — All 4 Models Logistic Regression, Decision Tree, Random Forest, and Gradient Boosting plotted head-to-head. With threshold optimisation at 0.35, the model achieves 87% recall — ensuring the maximum number of at-risk employees are flagged before they resign.
🔑 Feature Importance (Random Forest) Top 12 drivers of attrition ranked by importance. MonthlyIncome, OverTime, YearsAtCompany, JobSatisfaction, and YearsSinceLastPromotion consistently rank as the most predictive features.
🗺️ Attrition Heatmap: Dept × Age Group Full matrix of attrition rates across department and age band combinations — instantly identifying the highest-risk intersections for targeted retention programmes.
📋 HR KPI Scorecard Table A complete monthly review scorecard covering all key metrics, risk summaries, and ML performance indicators — formatted for HR leadership reporting.
| Insight | Business Action |
|---|---|
| Sales dept has 3× higher attrition than R&D | Review Sales compensation & career path |
| Overtime doubles attrition risk in 6 of 7 depts | Cap mandatory overtime, offer flex-time |
| ₹20K salary band has highest exit rate | Immediate pay band review for bottom tier |
| 0–1 year cohort: highest churn | Redesign onboarding & 90-day check-in program |
| 54% of exits = triple-risk group | Targeted intervention: promotion + workload relief |
| 87% recall ML model catches most at-risk employees | Monthly HR scoring run on all employees |
| Single employees leave 2× more than married | Engagement programmes for younger workforce |
| Promotion gap >4 years = attrition spike | Enforce annual promotion review cycles |
git clone https://github.com/Munishx01/hr-attrition-analytics.git
cd hr-attrition-analytics
pip install -r requirements.txt
python src/generate_data.py # Generate 14K employee dataset
python src/eda_ml_pipeline.py # Run EDA + train ML models + generate dashboardshr-attrition-analytics/
├── 📁 data/
│ └── hr_attrition.csv # 14,000 employee records, 27 features
├── 📁 src/
│ ├── generate_data.py # IBM-style HR data generator
│ └── eda_ml_pipeline.py # Full EDA + 4 ML models + 5 dashboards
├── 📁 outputs/
│ ├── 📁 figures/
│ │ ├── 01_executive_workforce_overview.png
│ │ ├── 02_attrition_risk_factors.png
│ │ ├── 03_ml_model_performance.png
│ │ ├── 04_workforce_demographics.png
│ │ └── 05_hr_kpi_scorecard.png
│ └── 📁 models/
│ ├── best_model.pkl
│ └── scaler.pkl
├── requirements.txt
└── README.md
Munish Kumar — Data Analyst | Python | SQL | Machine Learning
📧 mk611453@gmail.com | 📍 Palampur, Himachal Pradesh
"People are your greatest asset — data helps you protect them." 👥




