JobMatch Bot is an AI-assisted job intelligence pipeline that aggregates, normalizes, deduplicates, and ranks jobs from multiple ATS and company career systems such as Workday, Greenhouse, Lever, and Eightfold.
The system focuses on recall-first ingestion followed by ranking to help candidates discover relevant roles that traditional job portals often hide.
- Multi-source ATS adapters (Amazon, Greenhouse, Lever, Workday, Apple, Microsoft)
- Recall-first job ingestion pipeline
- Job normalization and deduplication
- Resume-aware scoring signals
- Diagnostics for debugging missing sources
- CSV and Markdown outputs for review
Companies → Adapters → Normalization → Dedupe → Scoring → Ranked Jobs
CLI → Pipeline → Adapters → Data Sources → Normalization → Deduplication → Scoring Engine → Ranked Job Results
git clone https://github.com/thalaai/jobmatch-bot.git
cd jobmatch-bot
deactivate 2>/dev/null
hash -r
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install -e .Verify the install before your first run:
which python
python --version
python -m pip show job-intel
python -c "import job_intel; print('job_intel import OK')"
job-intel --helpExpected result:
which pythonpoints to.venv/bin/pythonpython -m pip show job-intelprints package metadata- the import check prints
job_intel import OK job-intel --helpshows the CLI usage
The maintained engine is job_intel.
cp job_intel_sources.example.yaml job_intel_sources.yaml
job-intel --sources job_intel_sources.yaml --resume resume.txt --out-dir ./out --min-score 0.50Outputs:
out/normalized_jobs.csvout/ranked_jobs.csvout/job_matches.csvout/job_matches.mdout/diagnostics.csv
Which file to use:
- Open
out/job_matches.mdfirst. This is the main human-readable shortlist to review and apply from. - Use
out/job_matches.csvif you want the same shortlist in spreadsheet form for sorting or filtering. - Use
out/ranked_jobs.csvif you want a larger ranked pool beyond the shortlist. - Use
out/normalized_jobs.csvandout/diagnostics.csvfor analysis and debugging, not for direct job applications.
After copying the repo, do this in order:
- Create your working source registry from the example:
cp job_intel_sources.example.yaml job_intel_sources.yaml- Add your resume as
resume.txtin the repo root. You can start from the example format:cp examples/resume.txt resume.txt
Example:
jobmatch-bot/
resume.txt
job_intel_sources.yaml
pyproject.toml
- Edit
job_intel_sources.yamland add or remove companies/sources you want to search.
Example:
- keep the provided
Amazon,Waymo, andZooxsources if you want a known-good starting point - disable any source by setting
enabled: false - add more companies by adding entries under
companies:andsources:
- Run the engine:
job-intel --sources job_intel_sources.yaml --resume resume.txt --out-dir ./out --min-score 0.50Optional first run with source discovery enabled:
job-intel --sources job_intel_sources.yaml --resume resume.txt --out-dir ./out --min-score 0.50 --discover- Inspect the results:
sed -n '1,25p' out/diagnostics.csv
sed -n '1,40p' out/job_matches.mdApply from out/job_matches.md first. If you need more options, expand to out/ranked_jobs.csv.
Before uploading the repo to GitHub, run one clean local test from the repo root:
cd /xxxxxx/jobmatch-bot
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install -e .
cp job_intel_sources.example.yaml job_intel_sources.yaml
cp examples/resume.txt resume.txt
which python
which job-intel
python -c "import job_intel; print('job_intel import OK')"
job-intel --help
job-intel --sources ./job_intel_sources.yaml --resume ./resume.txt --out-dir ./out --min-score 0.50
ls -la out
sed -n '1,20p' out/diagnostics.csv
sed -n '1,40p' out/job_matches.mdThis validates:
- the package installs cleanly
- the CLI resolves correctly
- the app runs end-to-end with example inputs
- the expected output files are generated
Optional test:
python -m pytest job_intel/testsAfter testing, remove local-only files before uploading to GitHub:
rm -rf .venv cache out resume.txt job_intel_sources.yaml
find . -name "__pycache__" -type d -exec rm -rf {} +
find . -name "*.pyc" -deleteDo not remove:
job_intel_sources.example.yamlexamples/resume.txt
All company configuration lives in job_intel_sources.yaml.
You do not need to edit Python code just to:
- add a new company
- remove a company
- disable a failing source
- change a source endpoint
- change search queries in
metadata
Example source entry:
- company: Amazon
source_name: amazon_search_json
source_type: amazon_json
entrypoint: https://www.amazon.jobs/en/search.json
metadata:
page_size: 100
queries:
- technical program manager
- program managerPlace your resume in the repo root as resume.txt.
The intended default workflow is:
- clone the repo
- add
resume.txt - edit
job_intel_sources.yaml - run the CLI
- review
out/job_matches.mdfirst, then useout/diagnostics.csvonly for troubleshooting
Current scoring uses:
- title family
- seniority signals
- leadership / delivery signals
- domain signals
- location fit
Resume-aware scoring is the next intended extension point in job_intel/engine/.
The file examples/resume.txt is a format example only.
job_intel/The publishable multi-source engine.job_intel_sources.example.yamlSafe example source registry.examples/Example input files only.
The engine reads a registry YAML with:
companiessources
Each source entry defines:
companysource_namesource_typeentrypoint- optional
metadata
Supported source families currently include:
amazon_jsongreenhouseleverworkdayapple_jsonldmicrosoft_searcheightfoldmeta_graphql
Every run emits a diagnostics CSV with source health and per-stage counts:
raw_foundnormalized_founddeduped_outscored_countoutput_counterror
This is the primary debugging surface when a company returns no jobs.
- More ATS adapters
- Better Workday and Eightfold coverage
- Stronger normalization and location handling
- Resume-aware scoring
- Historical crawl tracking
- Hosted version later
See CONTRIBUTING.md for how to add adapters, sources, and tests.
Before publishing a release:
- verify
python -m pip install -e .succeeds in a clean virtualenv - copy
job_intel_sources.example.yamltojob_intel_sources.yamland run one sample crawl - confirm
out/diagnostics.csvcontains at least oneworkingsource - make sure no local outputs, resumes, caches, or temporary directories are present
- replace the placeholder clone URL in
README.md
We welcome contributions.
Some good starter issues:
- SmartRecruiters adapter
- Ashby ATS adapter
- Location normalization improvements
- Diagnostics enhancements
Check the Issues tab for tasks labeled good-first-issue.
Traditional job portals often hide relevant roles due to strict keyword filtering, ranking bias, and incomplete indexing of company career sites.
JobMatch Bot takes a different approach:
- Recall-first ingestion to collect as many relevant roles as possible
- Normalization and deduplication across multiple ATS systems
- Candidate-aware ranking to surface the best matches
The goal is to help candidates discover opportunities that conventional job portals frequently miss.
MIT (see LICENSE).