Team Code Paradox — Data & AI Challenge: Intelligent Candidate Discovery
Built by Soham Lodh and Dhrupad Paitandy to answer one question most systems ignore:
Not "does this candidate match the skills?" — but "what is the probability this candidate gets hired?"
- What We Built
- Performance & Constraints
- How It Works — End-to-End Pipeline
- Ranking Methodology — The Scoring Engine
- Explainability & Integrity Validation
- Architecture
- App Tabs
- Tech Stack
- Local Setup — Step-by-Step
- Docker Setup
- Stage 3 CLI Reproduction (Offline Ranking)
- Running Tests
- Configuration Reference
- Known Limitations
- Challenges We Faced
- Submission Assets
Recruiters face an impossible task: hundreds of thousands of resumes, dozens of active roles, and no scalable way to find who actually fits. This platform automates the hardest part of the screening process.
Given:
- A job description (PDF, DOCX, TXT, or Markdown)
- A candidate dataset in JSONL format (up to 100,000+ records)
- A JSON schema describing the candidate data structure
The platform outputs: A ranked shortlist of the top 100 best-fit candidates, complete with composite scores, per-dimension breakdowns, skill gap analysis, integrity flags, AI-generated recruiter rationale, and a one-click exportable CSV.
The entire pipeline — from unstructured JD to ranked shortlist — requires no manual configuration, no field name assumptions, and no per-candidate LLM calls.
This solution was built and validated against the challenge's hardware and runtime constraints.
| Metric | Value |
|---|---|
| End-to-End Pipeline Runtime (100K Candidates) | 4 minutes 57.91 seconds |
| Job Description Intelligence Extraction (LLM Processing) | 24.30 seconds |
| Offline Candidate Ranking Runtime (Post-JD Analysis) | 3 minutes 53.18 seconds |
| Hardware | CPU only — no GPU required |
| Memory | 16 GB RAM |
| Dataset size | 100,000+ candidate records |
| Max file size accepted | 500 MB (both locally and on deployed Streamlit) |
| Deployed platform runtime | 5–10 minutes (Streamlit Cloud free tier) |
A demo video of the app running the full 100K dataset inside a Docker container with 16 GB RAM and 1 CPU is available here:
📹 Video Demo (Google Drive)
- No LLM in the hot path. The AI layer runs exactly once — to extract structured JD intelligence. All candidate scoring is local and deterministic.
- Streaming JSONL pipeline. Candidates are processed one-by-one from disk. The full dataset is never loaded into memory.
- Top-K heap. Only the top 500 candidates are kept in memory at any time during scoring.
- Semantic retrieval narrows the field. A FAISS-backed embedding funnel reduces 100K candidates to 5,000 before the expensive feature scoring runs.
- Graceful fallback. If FAISS or the embedding model is unavailable, the system degrades to a deterministic hash-based embedding — without user impact.
The recruiter uploads three things through the Streamlit UI:
| Input | Accepted Formats | Purpose |
|---|---|---|
| Job Description | PDF, DOCX, TXT, Markdown | Describes the open role |
| Candidate Dataset | .jsonl or .jsonl.gz (up to 500 MB) |
The full candidate pool |
| Candidate Schema | JSON Schema Draft 7 (.json) |
Defines the structure of candidate records |
The job description is sent once to an OpenRouter LLM (temperature 0.0 for deterministic output). The model extracts a structured JDIntelligence object containing:
- Required and preferred skills
- Seniority level and experience minimums
- Responsibilities and production signals
- Behavioral expectations
- Education keywords and location preferences
This structured profile drives all downstream ranking logic. It is cached to disk (jd_cache.json) and reused for offline reproduction without any further LLM calls.
The uploaded JSON schema is analyzed and mapped into a SchemaMap through semantic keyword matching. The ranker resolves candidate fields (skills, experience, location, seniority, etc.) dynamically, regardless of how the schema is structured.
There are zero hardcoded field name assumptions. A schema using capabilities or toolkit is handled identically to one using skills.
A SemanticRetriever (FAISS + sentence-transformers/all-MiniLM-L6-v2) encodes the job description and all candidate profiles as dense vectors and retrieves the top 5,000 most semantically similar candidates for scoring.
The funnel: 100K → 5,000 (retrieval) → 500 (feature scoring) → 100 (final output)
Every retrieved candidate is scored against 21 independent signals across four categories:
| Category | Signals |
|---|---|
| Technical Fit | Required skill coverage, preferred skill coverage, skill depth, production signals, lexical relevance |
| Role Relevance | Role fit, semantic match, relevant experience, seniority alignment, career consistency |
| Behavioral Signals | Recruiter response rate, interview completion, GitHub activity, availability, notice period |
| Trust & Integrity | Profile completeness, consistency analysis, fraud detection heuristics, evidence confidence |
The final composite score formula:
Final Score = f(Technical Fit, Role Relevance, Behavioral Signals, Trust Signals)
× Must-Have Gate × Integrity Multiplier
Scores are normalized to a 0–1 range using min-max scaling across the exported top 100, giving recruiters a clean relative comparison.
| Tab | Content |
|---|---|
| Results | Ranked top-100 table with composite scores |
| Candidate Detail | Per-candidate breakdown of all 21 scoring dimensions |
| Analytics | Score distribution histogram, feature importance bar chart |
| Export | One-click CSV download: candidate_id, rank, score, reasoning |
Technical Fit (weighted 55% of base score)
required_skill_coverage— token + substring match of JD-required skills against candidate profilepreferred_skill_coverage— same for preferred skillsskill_depth— proficiency level, endorsement count, months of evidence per skill recordproduction_experience_signals— keywords and quantitative metrics indicating production deliverylexical_relevance— density of JD terms across full candidate text
Role Relevance (weighted 32% of base score)
role_fit— overlap of JD title and responsibilities against candidate's current rolesemantic_match— FAISS cosine similarity score (rescaled)relevant_experience— years of experience vs. JD minimum, with surplus bonusseniority_match— title-level signal, seniority term detection, year-adjusted scorecareer_consistency— average tenure, job progression, title alignment
Behavioral Signals and Hireability (weighted 13% of base score)
behavioral_score— recruiter response rate, offer acceptance rate, profile views, interview completionavailability— notice period, open-to-work flag, relocation willingnesslocation_match— candidate location vs. JD location preferenceseducation_relevance— degree keywords, institution tier signal, education field matchevidence_confidence— richness of profile data (history entries, skill records, signal fields)
Must-Have Gate — a gate score computed from required skill coverage, role fit, experience, and integrity. Candidates who fail required skills are down-weighted even with high behavioral scores.
Integrity Multiplier — a penalty applied for profiles with sparse data, promotional language patterns, or inconsistent experience claims.
Every ranking decision is auditable.
Each candidate surfaces:
- A composite score and rank
- An individual score for every one of the 21 dimensions
- Matched skills (with evidence: proficiency, duration, endorsement count)
- Missing required skills
- Integrity flags (sparse data, inflated language, date inconsistencies)
- Recruiter-ready reasoning built from scoring evidence — not generated from raw profile text
- LLM usage is limited to exactly two calls: JD intelligence extraction and (optionally) explanation generation.
- All AI responses are validated through strict Pydantic schemas before use.
- Candidate explanations are generated from structured scoring outputs. The model is instructed never to invent facts.
- All ranking decisions are produced by deterministic local scoring — the LLM has zero influence on rank order.
- Sparse profile detection (thin text, low token diversity)
- Promotional language pattern detection ("rockstar", "guru", repeated superlatives)
- Open-ended experience date validation
- Profile completeness signal from platform data
- Inconsistency signals reduce composite score through the Integrity Multiplier
app.py
└── candidate_ranker/services.py # Orchestrates the full pipeline
├── ingestion.py # Reads JD (PDF/DOCX/TXT/MD), schema, JSONL candidates
├── schema_mapping.py # Builds SchemaMap via semantic keyword ranking
├── ai_service.py # LLM calls: JD extraction + candidate explanations
│ └── ai/openrouter_client.py # Single point of LLM contact; retries + failover
├── retrieval.py # FAISS semantic retrieval + NumPy fallback
├── ranking.py # 21-signal feature engine, top-K heap, integrity checks
├── export.py # CSV generation, min-max normalization, scoring reasoning
├── upload_server.py # Sidecar chunked upload server for large JSONL files
└── models.py # Pydantic models: JDIntelligence, CandidateScore, SchemaMap
ai/openrouter_client.pyis the single point of contact with any LLM API. No other module calls OpenRouter.- All ranking, retrieval, scoring, and fraud detection runs locally — zero LLM calls in the hot path.
- Pydantic v2 enforces strict data contracts at every boundary: JD intelligence, schema maps, candidate scores, and AI explanations all have typed models.
- FAISS degrades gracefully to a deterministic SHA-256 hash embedding if the embedding model is unavailable — the app always runs.
- LLM failover chain: DeepSeek V3 → Qwen 3 235B → Llama 3.3 70B. If the primary model rate-limits, the client retries with exponential backoff and then promotes to the next model automatically.
| Tab | What It Shows |
|---|---|
| 📤 Upload | Job description, candidate dataset, schema upload; pipeline trigger |
| 📄 JD Analysis | Structured JSON extracted from the job description by the LLM |
| 🗂️ Schema | Resolved SchemaMap — dynamic field paths derived from the candidate schema |
| ⚙️ Progress | Live pipeline status, funnel summary, stage-by-stage progress |
| 📊 Results | Ranked top-100 table with composite scores and progress bars |
| 👤 Candidate Detail | Per-candidate 21-dimension breakdown, matched/missing skills, AI explanation |
| 📈 Analytics | Score distribution histogram, feature importance bar chart for top 20 |
| 💾 Export | One-click CSV download of top-100 candidates |
| Layer | Technology |
|---|---|
| Frontend & App | Streamlit 1.35 |
| LLM API | OpenRouter (DeepSeek V3, Qwen 3 235B, Llama 3.3 70B) |
| Semantic Search | FAISS (faiss-cpu) + sentence-transformers |
| Data Validation | Pydantic v2 |
| Data Processing | Pandas, NumPy |
| Visualization | Plotly, Altair |
| Document Parsing | pypdf (PDF), python-docx (DOCX) |
| Containerization | Docker |
| Testing | pytest, pytest-asyncio |
- Python 3.11 or higher
pip- An OpenRouter API key (free tier works; rate limits trigger automatic model failover)
git clone https://github.com/Soham-Lodh/Candidate-Scanner-Hackathon.git
cd Candidate-Scanner-HackathonmacOS / Linux:
python -m venv .venv
source .venv/bin/activateWindows (PowerShell):
python -m venv .venv
.venv\Scripts\activateWindows (Git Bash):
python -m venv .venv
source .venv/Scripts/activate
⚠️ Make sure the virtual environment is active before running any further commands. Your terminal prompt should show(.venv).
pip install -r requirements.txtThis installs all packages including PyTorch (CPU-only build), FAISS, sentence-transformers, Streamlit, Pydantic, Pandas, and document parsing libraries. Expect the first install to take a few minutes — PyTorch and sentence-transformers are large.
cp .env.example .envOpen .env in a text editor and fill in your OpenRouter API key:
OPENROUTER_API_KEY=sk-or-xxxxxxxxxxxxxxxxxxxxxxxx
# Optional overrides (defaults shown):
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1/chat/completions
OPENROUTER_PRIMARY_MODEL=deepseek/deepseek-chat-v3-0324:free
OPENROUTER_TIMEOUT_SECONDS=45
APP_TOP_K_RETRIEVAL=5000
APP_TOP_K_FEATURES=500
APP_TOP_K_EXPLAIN=100
APP_ENABLE_AI_EXPLANATIONS=0Getting an API key: Sign up at openrouter.ai, go to Keys, and create a free key. The free tier supports the default DeepSeek V3 model used by this app.
streamlit run app.pyThe app opens at http://localhost:8501.
From there, use the Upload tab to load your job description, candidate JSONL, and schema — then click Run Ranking Analysis.
Docker is the recommended path for reproducing the exact hackathon environment (16 GB RAM, CPU-only).
- Docker Desktop installed and running
Edit docker-compose.yml and fill in OPENROUTER_API_KEY:
environment:
OPENROUTER_API_KEY: "sk-or-xxxxxxxxxxxxxxxxxxxxxxxx"docker-compose up --buildThe first build takes several minutes (downloads PyTorch CPU, FAISS, sentence-transformers). Subsequent starts are fast.
Open http://localhost:8501 in your browser.
The docker-compose.yml is pre-configured to match the hackathon constraints:
mem_limit: 16g # 16 GB RAM limit
memswap_limit: 16g # No swap
cpus: "1" # 1 CPU core
ports:
- "8501:8501" # Streamlit UI
- "8765:8765" # Chunked upload sidecarThe CLI uses a two-phase approach so the final ranking step is fully offline — no network access, no LLM calls, fully reproducible.
Complete Local Setup steps 1–4 before running these commands.
This step calls OpenRouter once to extract structured JD intelligence and writes a reusable cache file to disk.
Git Bash / macOS / Linux:
python prepare.py \
--job-description ./job_description.docx \
--schema ./candidate_schema.json \
--out jd_cache.jsonWindows PowerShell:
python prepare.py `
--job-description ./job_description.docx `
--schema ./candidate_schema.json `
--out jd_cache.jsonArguments:
| Argument | Description |
|---|---|
--job-description |
Path to job description file (.pdf, .docx, .txt, .md) |
--schema |
Path to JSON Schema file describing candidate structure |
--out |
Output path for the JD intelligence cache (.json) |
--model |
Optional: override the default OpenRouter model |
What this does:
- Parses the job description (supports PDF, DOCX, TXT, Markdown)
- Calls OpenRouter once with the JD text and schema context
- Extracts structured
JDIntelligence(skills, seniority, responsibilities, etc.) - Writes the result as
jd_cache.json— used by Phase 2
This is the command used to generate the submission CSV. It performs no LLM calls and no network requests.
Git Bash / macOS / Linux:
python rank.py \
--candidates ./candidates.jsonl \
--jd-cache jd_cache.json \
--schema ./candidate_schema.json \
--out submission.csvWindows PowerShell:
python rank.py `
--candidates ./candidates.jsonl `
--jd-cache jd_cache.json `
--schema ./candidate_schema.json `
--out submission.csvArguments:
| Argument | Description |
|---|---|
--candidates |
Path to candidate dataset (.jsonl or .jsonl.gz) |
--jd-cache |
Path to JD intelligence cache from Phase 1 |
--schema |
Path to JSON Schema file |
--out |
Output path for the submission CSV |
What this does:
- Loads
JDIntelligencefrom cache (no LLM calls) - Streams JSONL candidates from disk (no full-file memory load)
- Runs semantic retrieval funnel (FAISS → top 5,000)
- Runs the 21-signal feature scoring engine (keeps top 500 in a heap)
- Exports the top 100 as a CSV
candidate_id,rank,score,reasoning
CAND_0000042,1,1.00000000,"Backend Engineer with 6.9 yrs Toronto, Canada; 3 matched JD skills: Python (advanced, 26 mo), FastAPI (intermediate), PostgreSQL; ..."
CAND_0001337,2,0.83200000,...| Column | Description |
|---|---|
candidate_id |
Unique identifier from the candidate record |
rank |
Integer rank (1 = best fit) |
score |
Min-max normalized float in [0.1, 1.0] |
reasoning |
Recruiter-facing explanation grounded in scoring evidence |
pytestThe test suite covers:
| Test Module | Coverage |
|---|---|
test_export.py |
CSV generation, score normalization, reasoning output |
test_ingestion.py |
JSONL and gzipped JSONL parsing |
test_openrouter_client.py |
JSON validation, failover model ordering |
test_ranking.py |
Skill matching, stream scoring, top-K heap |
test_retrieval.py |
Streaming semantic retrieval, batch scoring |
test_schema_mapping.py |
Dynamic field resolution, path ranking |
test_services.py |
End-to-end pipeline with cached JD intelligence |
Run with verbose output:
pytest -vRun a specific test file:
pytest tests/test_ranking.py -vAll settings are loaded from environment variables (or .env):
| Variable | Default | Description |
|---|---|---|
OPENROUTER_API_KEY |
(required) | Your OpenRouter API key |
OPENROUTER_BASE_URL |
https://openrouter.ai/api/v1/chat/completions |
OpenRouter endpoint |
OPENROUTER_PRIMARY_MODEL |
deepseek/deepseek-chat-v3-0324 |
Primary LLM model |
OPENROUTER_TIMEOUT_SECONDS |
45 |
HTTP timeout per LLM request |
APP_TOP_K_RETRIEVAL |
5000 |
Candidates kept after semantic retrieval |
APP_TOP_K_FEATURES |
500 |
Candidates kept after feature scoring |
APP_TOP_K_EXPLAIN |
100 |
Candidates passed to AI explanation (if enabled) |
APP_ENABLE_AI_EXPLANATIONS |
0 |
Set to 1 to generate AI explanations per candidate |
LLM Failover Chain (automatic, no configuration needed):
deepseek/deepseek-chat-v3-0324(primary)qwen/qwen3-235b-a22b(first fallback)meta-llama/llama-3.3-70b-instruct(second fallback)
Each model retries with exponential backoff (2s → 4s → 8s) before failing over.
- PDF/DOCX parsing uses text extraction optimized for ranking workflows. Complex layouts with tables, multi-column text, or scanned pages may not extract perfectly.
- Embedding model cold start:
sentence-transformers/all-MiniLM-L6-v2downloads (~90 MB) on first use. If unavailable, retrieval automatically switches to a deterministic hash-based fallback — ranking still works. - OpenRouter rate limits: Free-tier accounts have per-minute limits. The automatic retry + failover chain handles this in most cases, but very high request volume may require a paid key.
- Streamlit file size limit: The deployed Streamlit Cloud app accepts files up to 500 MB. For datasets significantly larger than this, run locally or via Docker.
- Deployed platform speed: Streamlit Cloud's free tier provides limited CPU and memory, so the 100K ranking run takes 5–10 minutes. Locally or in Docker, the same run completes in under 6 minutes.
FAISS is fast but memory-hungry at 100K scale. We had to tune batch sizes for the streaming retrieval path and ensure vectors were kept as float32 (not float64) to cut memory in half. When FAISS failed in cold-start environments (Streamlit Cloud), we built a NumPy dot-product fallback so the app never broke — it just got slightly slower.
The candidate dataset uses a schema we don't control. Instead of hardcoding field names like candidate.skills or profile.years_of_experience, we built a semantic SchemaMap layer that scans the uploaded JSON schema and ranks field paths by keyword relevance. This made the ranker truly dataset-agnostic — and it surfaced edge cases (arrays of arrays, nested objects with no name field) that required careful path resolution logic.
Early designs considered sending each candidate to an LLM for scoring. At 100K candidates this would cost hundreds of dollars and take hours. We redesigned around the principle that the LLM touches only the JD (once), not the candidates. All scoring is local, deterministic, and reproducible — which also means the final rank order is identical every run.
A candidate who writes "Python, Python, Python" in their summary would score high on a naive keyword counter. Our _integrity checks and the must_have_gate formula discount profiles with inflated promotional language, low evidence diversity, and missing corroborating signals (skill duration, endorsements, career history). This pushes authentic profiles up and gaming attempts down.
Streamlit's native uploader assembles the entire file in memory before the app receives it. For 400–500 MB JSONL files, this was initially impractical, so we built a sidecar chunked upload server (upload_server.py) — a lightweight Python HTTP server that accepted browser-sliced chunks and wrote them directly to disk, bypassing Streamlit's in-memory path entirely. We later discovered that Streamlit's uploader can be configured to accept files up to 500 MB by setting maxUploadSize = 500 and maxMessageSize = 500 in .streamlit/config.toml. With that in place, the native uploader handles the full 100K candidate dataset without the chunked server, so we removed the port-based sidecar from the active upload path entirely. The upload_server.py remains in the codebase but the default flow now uses Streamlit's built-in uploader — simpler, no extra port, and fully compatible with both local and deployed environments.
Early export CSV reasoning outputs were AI-generated and generic: "Strong candidate with relevant experience." We replaced this with evidence-grounded reasoning built directly from scored fields: skill proficiency levels, endorsement counts, GitHub activity scores, notice periods — real signal, not boilerplate. The model, when used for explanations, is explicitly instructed not to invent facts not present in the scoring output.
| Asset | Link |
|---|---|
| 🌐 Live App | ai-candidate-ranker-hackathon.streamlit.app |
| 💻 GitHub Repository | github.com/Soham-Lodh/Candidate-Scanner-Hackathon |
| 📹 Demo Video + Submission CSV | Google Drive |
The demo video shows the full pipeline running inside Docker with 16 GB RAM and a single CPU — including the ranked candidate CSV output generated from the 100K dataset.
MIT