JobMatch Bot

JobMatch Bot is an AI-assisted job intelligence pipeline that aggregates, normalizes, deduplicates, and ranks jobs from multiple ATS and company career systems such as Workday, Greenhouse, Lever, and Eightfold.

The system focuses on recall-first ingestion followed by ranking to help candidates discover relevant roles that traditional job portals often hide.

Features

Multi-source ATS adapters (Amazon, Greenhouse, Lever, Workday, Apple, Microsoft)
Recall-first job ingestion pipeline
Job normalization and deduplication
Resume-aware scoring signals
Diagnostics for debugging missing sources
CSV and Markdown outputs for review

System Architecture

Companies → Adapters → Normalization → Dedupe → Scoring → Ranked Jobs

CLI → Pipeline → Adapters → Data Sources → Normalization → Deduplication → Scoring Engine → Ranked Job Results

Setup

git clone https://github.com/thalaai/jobmatch-bot.git
cd jobmatch-bot
deactivate 2>/dev/null
hash -r
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install -e .

Verify the install before your first run:

which python
python --version
python -m pip show job-intel
python -c "import job_intel; print('job_intel import OK')"
job-intel --help

Expected result:

which python points to .venv/bin/python
python -m pip show job-intel prints package metadata
the import check prints job_intel import OK
job-intel --help shows the CLI usage

Quick Start

The maintained engine is job_intel.

cp job_intel_sources.example.yaml job_intel_sources.yaml
job-intel --sources job_intel_sources.yaml --resume resume.txt --out-dir ./out --min-score 0.50

Outputs:

out/normalized_jobs.csv
out/ranked_jobs.csv
out/job_matches.csv
out/job_matches.md
out/diagnostics.csv

Which file to use:

Open out/job_matches.md first. This is the main human-readable shortlist to review and apply from.
Use out/job_matches.csv if you want the same shortlist in spreadsheet form for sorting or filtering.
Use out/ranked_jobs.csv if you want a larger ranked pool beyond the shortlist.
Use out/normalized_jobs.csv and out/diagnostics.csv for analysis and debugging, not for direct job applications.

First Run

After copying the repo, do this in order:

Create your working source registry from the example:

cp job_intel_sources.example.yaml job_intel_sources.yaml

Add your resume as resume.txt in the repo root. You can start from the example format:cp examples/resume.txt resume.txt

Example:

jobmatch-bot/
  resume.txt
  job_intel_sources.yaml
  pyproject.toml

Edit job_intel_sources.yaml and add or remove companies/sources you want to search.

Example:

keep the provided Amazon, Waymo, and Zoox sources if you want a known-good starting point
disable any source by setting enabled: false
add more companies by adding entries under companies: and sources:

Run the engine:

job-intel --sources job_intel_sources.yaml --resume resume.txt --out-dir ./out --min-score 0.50

Optional first run with source discovery enabled:

job-intel --sources job_intel_sources.yaml --resume resume.txt --out-dir ./out --min-score 0.50 --discover

Inspect the results:

sed -n '1,25p' out/diagnostics.csv
sed -n '1,40p' out/job_matches.md

Apply from out/job_matches.md first. If you need more options, expand to out/ranked_jobs.csv.

Test Before Upload

Before uploading the repo to GitHub, run one clean local test from the repo root:

cd /xxxxxx/jobmatch-bot
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install -e .
cp job_intel_sources.example.yaml job_intel_sources.yaml
cp examples/resume.txt resume.txt
which python
which job-intel
python -c "import job_intel; print('job_intel import OK')"
job-intel --help
job-intel --sources ./job_intel_sources.yaml --resume ./resume.txt --out-dir ./out --min-score 0.50
ls -la out
sed -n '1,20p' out/diagnostics.csv
sed -n '1,40p' out/job_matches.md

This validates:

the package installs cleanly
the CLI resolves correctly
the app runs end-to-end with example inputs
the expected output files are generated

Optional test:

python -m pytest job_intel/tests

After testing, remove local-only files before uploading to GitHub:

rm -rf .venv cache out resume.txt job_intel_sources.yaml
find . -name "__pycache__" -type d -exec rm -rf {} +
find . -name "*.pyc" -delete

Do not remove:

job_intel_sources.example.yaml
examples/resume.txt

Editing Companies

All company configuration lives in job_intel_sources.yaml.

You do not need to edit Python code just to:

add a new company
remove a company
disable a failing source
change a source endpoint
change search queries in metadata

Example source entry:

- company: Amazon
  source_name: amazon_search_json
  source_type: amazon_json
  entrypoint: https://www.amazon.jobs/en/search.json
  metadata:
    page_size: 100
    queries:
      - technical program manager
      - program manager

Resume and Candidate Fit

Place your resume in the repo root as resume.txt.

The intended default workflow is:

clone the repo
add resume.txt
edit job_intel_sources.yaml
run the CLI
review out/job_matches.md first, then use out/diagnostics.csv only for troubleshooting

Current scoring uses:

title family
seniority signals
leadership / delivery signals
domain signals
location fit

Resume-aware scoring is the next intended extension point in job_intel/engine/.

The file examples/resume.txt is a format example only.

Project Layout

job_intel/ The publishable multi-source engine.
job_intel_sources.example.yaml Safe example source registry.
examples/ Example input files only.

Source Registry

The engine reads a registry YAML with:

companies
sources

Each source entry defines:

company
source_name
source_type
entrypoint
optional metadata

Supported source families currently include:

amazon_json
greenhouse
lever
workday
apple_jsonld
microsoft_search
eightfold
meta_graphql

Diagnostics

Every run emits a diagnostics CSV with source health and per-stage counts:

raw_found
normalized_found
deduped_out
scored_count
output_count
error

This is the primary debugging surface when a company returns no jobs.

Roadmap (short)

More ATS adapters
Better Workday and Eightfold coverage
Stronger normalization and location handling
Resume-aware scoring
Historical crawl tracking
Hosted version later

Contributing

See CONTRIBUTING.md for how to add adapters, sources, and tests.

Release Checklist

Before publishing a release:

verify python -m pip install -e . succeeds in a clean virtualenv
copy job_intel_sources.example.yaml to job_intel_sources.yaml and run one sample crawl
confirm out/diagnostics.csv contains at least one working source
make sure no local outputs, resumes, caches, or temporary directories are present
replace the placeholder clone URL in README.md

Help Wanted

We welcome contributions.

Some good starter issues:

SmartRecruiters adapter
Ashby ATS adapter
Location normalization improvements
Diagnostics enhancements

Check the Issues tab for tasks labeled good-first-issue.

Why JobMatch Bot

Traditional job portals often hide relevant roles due to strict keyword filtering, ranking bias, and incomplete indexing of company career sites.

JobMatch Bot takes a different approach:

Recall-first ingestion to collect as many relevant roles as possible
Normalization and deduplication across multiple ATS systems
Candidate-aware ranking to surface the best matches

The goal is to help candidates discover opportunities that conventional job portals frequently miss.

License

MIT (see LICENSE).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JobMatch Bot

Features

System Architecture

Setup

Quick Start

First Run

Test Before Upload

Editing Companies

Resume and Candidate Fit

Project Layout

Source Registry

Diagnostics

Roadmap (short)

Contributing

Release Checklist

Help Wanted

Why JobMatch Bot

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
examples		examples
job_intel		job_intel
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
job_intel_sources.example.yaml		job_intel_sources.example.yaml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

JobMatch Bot

Features

System Architecture

Setup

Quick Start

First Run

Test Before Upload

Editing Companies

Resume and Candidate Fit

Project Layout

Source Registry

Diagnostics

Roadmap (short)

Contributing

Release Checklist

Help Wanted

Why JobMatch Bot

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages