Description
Current resume parser uses basic regex patterns, missing 40% of skills, experiences, and education. NLP-based extraction with skill ontology mapping would accurately identify competencies, enabling proper job matching and skill gap analysis.
Current Impact: 60% extraction accuracy, many skills missed, resulting in poor job matches.
Expected Business Value: 70% improvement in resume understanding, 85% higher match quality, expanded to 15+ skill categories from current 3.
Steps to Reproduce
- Upload resume with diverse skills (e.g., 'TensorFlow', 'full-stack development', 'AWS DevOps')
- Check extracted skills
- Observe: many skills missed, typos not recognized, related skills not grouped
Environment Information
- Python 3.8+
- NLTK/spaCy available
- Node.js for UI (if applicable)
- Test data: 50 sample resumes
Expected Behavior
- Extracts 90%+ of skills from diverse resumes
- Recognizes skill variations (e.g., 'react' and 'reactjs' as same)
- Groups related skills (Python, Java -> Languages)
- Confidence scores for each extraction
- Handles typos and abbreviations
- Supports 50+ skill categories
Actual Behavior
- Only 60% extraction accuracy
- Regex-only approach misses many skills
- No synonym/variant recognition
- No skill grouping or categorization
- No confidence metrics
Screenshots or Recordings
Not applicable - parsing logic missing
Additional Context
Affected Users: Job seekers with diverse backgrounds; tech skills not properly recognized.
Root Cause: Regex-based extraction too simplistic for skill diversity.
Proposed Solution: Use NLP entity recognition (spaCy) plus skill ontology database matching.
Implementation Steps:
- Build skill ontology (skills.json) with 500+ entries, variants, categories
- Integrate spaCy NER for entity recognition
- Implement skill entity linking to ontology
- Add fuzzy matching for typos (Levenshtein distance)
- Implement skill grouping logic
- Add confidence scoring mechanism
- Create REST endpoint: POST /parse-resume returns JSON
Test Cases:
- Resume 1 (tech): extracts Python, JavaScript, AWS, Docker (expect all 4)
- Resume 2 (typos): 'Pyton', 'React.Js' (expect recognized as Python, React)
- Resume 3 (synonyms): 'full-stack', 'fullstack', 'full stack' (expect grouped as same)
- Resume 4 (edge): 25 skills mentioned (expect 90%+ accuracy)
- Confidence: scored skills >0.8 confidence, low-confidence flagged for review
- Performance: parse resume <2 seconds
Severity: High - critical for feature accuracy
Expected Points: 500-600 GSSoC points
Suggested Labels
enhancement, nlp, parsing, skill-extraction, ml, resume-analysis, GSSoC26
Description
Current resume parser uses basic regex patterns, missing 40% of skills, experiences, and education. NLP-based extraction with skill ontology mapping would accurately identify competencies, enabling proper job matching and skill gap analysis.
Current Impact: 60% extraction accuracy, many skills missed, resulting in poor job matches.
Expected Business Value: 70% improvement in resume understanding, 85% higher match quality, expanded to 15+ skill categories from current 3.
Steps to Reproduce
Environment Information
Expected Behavior
Actual Behavior
Screenshots or Recordings
Not applicable - parsing logic missing
Additional Context
Affected Users: Job seekers with diverse backgrounds; tech skills not properly recognized.
Root Cause: Regex-based extraction too simplistic for skill diversity.
Proposed Solution: Use NLP entity recognition (spaCy) plus skill ontology database matching.
Implementation Steps:
Test Cases:
Severity: High - critical for feature accuracy
Expected Points: 500-600 GSSoC points
Suggested Labels
enhancement, nlp, parsing, skill-extraction, ml, resume-analysis, GSSoC26