Adult Census Income Prediction 📊

This project analyzes the "Adult" dataset from the 1994 US Census database to predict whether an individual's annual income exceeds $50,000. The analysis involves data cleaning, comprehensive exploratory data analysis (EDA), and the implementation and evaluation of several machine learning classification models.

Dataset Overview

The dataset provides a rich collection of socio-economic and demographic features for predicting income levels.

Source: 1994 US Census Bureau Database
Total Instances: 48,842 records, split into a training set (32,561) and a testing set (16,281).
Cleaned Instances: After removing records with unknown values, 45,222 instances remain.
Features: The dataset contains a mix of continuous and categorical attributes, including: age, workclass, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, and hours-per-week.

Analysis and Findings

The project followed a structured workflow from data preparation to modeling, yielding key insights into the factors that correlate with income.

Exploratory Data Analysis (EDA) 📈

Visual exploration of the data revealed several significant trends:

Education & Income: A strong positive correlation exists between higher education levels and earning over $50K. Individuals in the higher-income bracket consistently have more years of education.
Age & Income: Those earning over $50K are, on average, older than those earning less. The analysis showed that the prime earning years appear to be between the late 30s and 50s.
Demographic Factors: The data shows notable income disparities across different demographic groups. A significantly larger proportion of males earn over $50K compared to females. Similarly, individuals identified as "White" or "Asian-Pac-Islander" have a higher representation in the >50K income category.

Predictive Modeling and Results 🤖

Several classification models were trained and evaluated. The performance was measured by prediction accuracy on the test set.

Model	Accuracy
Decision Tree	84.49%
K-Nearest Neighbors	82.50%
Random Forest	85.30%

The Random Forest model was the top performer, achieving an accuracy of 85.3%. This result is highly competitive with benchmark models documented for this dataset, such as C4.5 (84.46%) and NBTree (85.90%).

Conclusion

The analysis successfully identified key predictors of income and built an effective classification model.

Best Model: The Random Forest model provided the highest accuracy and is the recommended model for this prediction task.
Key Predictors: The most influential factors in predicting income were determined to be an individual's relationship status, capital gains, education level, and age.

How to Run This Project

To replicate this analysis, follow these steps.

Prerequisites

R: Ensure you have a working installation of R.
RStudio: An IDE like RStudio is recommended.

Required R Libraries

Install the necessary R packages by running the following command in your R console:

install.packages(c("dplyr", "ggplot2", "tidyverse", "GGally", "tree", "class", "caret", "randomForest", "factoextra"))

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Index		Index
README.md		README.md
STAT362 Presentation.pdf		STAT362 Presentation.pdf
adult.data		adult.data
adult.names		adult.names
adult.test		adult.test
analysis.R		analysis.R
main.py		main.py
main.r		main.r
old.adult.names		old.adult.names

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adult Census Income Prediction 📊

Dataset Overview

Analysis and Findings

Exploratory Data Analysis (EDA) 📈

Predictive Modeling and Results 🤖

Conclusion

How to Run This Project

Prerequisites

Required R Libraries

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adult Census Income Prediction 📊

Dataset Overview

Analysis and Findings

Exploratory Data Analysis (EDA) 📈

Predictive Modeling and Results 🤖

Conclusion

How to Run This Project

Prerequisites

Required R Libraries

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages