Skip to content

Karrtik12/AutoResume-Screening

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Automated Resume Screening System ๐Ÿ“„๐Ÿ”

๐Ÿ“Œ Overview

Recruitment processes often involve manually filtering through thousands of resumes, which is time-consuming and prone to human error. This project automates the task by using Natural Language Processing (NLP) and Machine Learning to classify resumes into their respective job domains (e.g., Data Science, Web Development, HR, etc.) with high accuracy.

๐Ÿš€ Features

  • Automated Categorization: Classifies resumes into 25 specific job categories.
  • Text Preprocessing: Cleans raw text by removing URLs, hashtags, mentions, and special characters using RegEx.
  • Feature Extraction: Converts text data into numerical vectors using TF-IDF (Term Frequency-Inverse Document Frequency).
  • High Accuracy: Achieves ~99% accuracy on the test set using the KNN algorithm.

๐Ÿ› ๏ธ Tech Stack

  • Language: Python
  • Libraries: Pandas, NumPy, Matplotlib, Seaborn
  • NLP: NLTK, Regular Expressions (re)
  • Machine Learning: Scikit-Learn (KNeighborsClassifier, OneVsRestClassifier, TfidfVectorizer)

๐Ÿ“Š Methodology

  1. Data Loading: Utilized a dataset containing 962 resumes labeled with their respective categories.
  2. Data Cleaning: Implemented a helper function to strip unnecessary data:
    • URLs (http\S+)
    • RT and cc
    • Hashtags and Mentions
    • Punctuation and Special Characters
  3. Visualization: Analyzed category distribution using Seaborn count plots and pie charts.
  4. Encoding: Encoded categorical labels using LabelEncoder.
  5. Vectorization: Transformed the cleaned text into TF-IDF features.
  6. Model Training: Trained a K-Nearest Neighbors (KNN) classifier wrapped in a One-vs-Rest strategy to handle the multi-class classification problem.

๐Ÿ“ˆ Results

The model was evaluated using a split of the dataset (80% training, 20% testing).

  • Training Accuracy: ~100%
  • Test Accuracy: ~97%

๐Ÿ”ฎ Future Scope

  • Integration with a web application (Streamlit/Flask) for real-time file uploading.
  • Support for parsing raw PDF and DOCX files directly.
  • Expansion of the dataset to include more diverse job roles.

About

NLP-driven Automated Resume Screening System using Scikit-Learn and TF-IDF to classify candidates across 25 job domains with ~97% accuracy.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors