Skip to content

ulari/recruit

Repository files navigation

Recruit

A tool for generating competency-based technical assessments grounded in Bloom's Revised Taxonomy and Behaviorally Anchored Rating Scales (BARS). It uses LLMs to extract competencies from job descriptions, generate scenario-based interview questions, conduct conversational assessments, and evaluate candidates on reasoning quality.

Why

Different question formats test different cognitive levels. Multiple-choice questions are efficient for assessing recall and comprehension (Bloom Levels 1–2), while scenario-based questions can target application, analysis, and evaluation (Levels 3–5). Structured interviews with behavioral anchors are among the most validated selection methods for predicting job performance (Schmidt & Hunter, 1998; AAMC, 2024).

This tool explores what happens when you apply Bloom's Revised Taxonomy and BARS to technical hiring: extract competencies from a job description, generate questions at specific cognitive levels, conduct a structured interview, and score responses against behavioral rubrics.

Methodology

This system is grounded in Bloom's Revised Taxonomy (Anderson & Krathwohl, 2001), the standard framework for cognitive assessment design.

Level Name What it tests Hiring relevance
1 Remember Recall facts Not recommended
2 Understand Explain concepts Minimal value
3 Apply Use knowledge in new situations Minimum for technical roles
4 Analyze Break down and examine Recommended for mid-senior roles
5 Evaluate Make judgments with criteria Recommended for senior roles
6 Create Design novel solutions Best assessed through project work

Structured interviews with behavioral anchors enable consistent scoring across candidates. 3–5 competencies per role is the recommended range for assessment (structured interview best practices).

Scoring: Behaviorally Anchored Rating Scales (BARS)

Each generated question includes a scoring rubric with behavioral anchors at three levels (3, 4, 5). Instead of checking whether a candidate mentioned the right keywords, the evaluator assesses the reasoning process — how the candidate approaches the problem, whether they anticipate edge cases, and whether they connect decisions to real-world impact.

Score What it means
3 Meets expectations: gives a reasonable answer but treats it as a fixed recipe. Doesn't explore trade-offs or edge cases without prompting.
4 Exceeds expectations: proposes solutions with clear rationale, anticipates at least one complication, connects choices to downstream impact.
5 Exceptional: frames the problem as a design decision with multiple valid approaches, articulates trade-offs, and proactively addresses scale, failure modes, or evaluation.

Bloom's Taxonomy determines what to ask (cognitive level of the question). BARS determines how to score the answer (what reasoning quality looks like at each level). Together they produce assessments that are both cognitively rigorous and consistently scorable.

Architecture

flowchart TB
  subgraph inputs [Inputs]
    JD["Job Description (text/URL)"]
    Quiz["Existing Quiz (optional)"]
  end

  subgraph agent1 [Module 1: Quiz Architect]
    direction TB
    A1["Extract Competencies from JD"]
    A2["Assign Bloom Levels per Competency"]
    A3["Generate Scenario Questions"]
    A4["Transform Existing MC Questions"]
    A1 --> A2 --> A3
    Quiz --> A4
    A4 --> A3
  end

  subgraph store1 [Quiz Store]
    QDoc["Generated Quiz (JSON)"]
  end

  subgraph agent2 [Module 2: Conversational Interviewer]
    direction TB
    B1["Load Quiz as Hidden Context"]
    B2["Conduct Adaptive Conversation"]
    B3["Record Transcript"]
    B1 --> B2 --> B3
  end

  subgraph store2 [Transcript Store]
    TDoc["Conversation Transcript (JSON)"]
  end

  subgraph agent3 [Module 3: Evaluation Engine]
    direction TB
    C1["Score per Competency (1-5)"]
    C2["Generate Candidate Summary"]
    C3["Batch Compare Candidates"]
    C4["Visualize Score Distributions"]
    C1 --> C2 --> C3 --> C4
  end

  JD --> A1
  A3 --> QDoc
  QDoc --> B1
  B3 --> TDoc
  TDoc --> C1
  QDoc --> C1
Loading

Quick Start

git clone https://github.com/ulari/recruit.git
cd recruit
pip install -r requirements.txt

OpenAI API key (pick one; both are gitignored):

  1. .env (loaded automatically via python-dotenv):

    cp .env.example .env
    # Edit .env: set OPENAI_API_KEY=sk-... (your real key)
  2. Streamlit secrets (handy for local Streamlit and for Streamlit Community Cloud):

    cp .streamlit/secrets.toml.example .streamlit/secrets.toml
    # Edit secrets.toml: set OPENAI_API_KEY = "sk-..."

If both are set, the environment variable from .env takes precedence.

streamlit run app.py

Modules

1. Quiz Architect (pages/1_Quiz_Architect.py)

  • Paste a job description to extract 4–6 competencies with Bloom level assignments
  • Generate scenario-based questions targeting Level 3–5
  • Upload an existing quiz to see the side-by-side transformation: original MC question on the left, scenario version on the right, Bloom level badges on both
  • Save the generated quiz to session state and disk

2. Interviewer (pages/2_Interviewer.py)

  • Load a quiz from saved files or carry one over from Module 1
  • Structured semi-interview: the AI interviewer covers all competency areas conversationally
  • Interviewer neutrality protocol: no feedback, no teaching, no answer evaluation during the interview (consistent with structured interview methodology)
  • Post-interview review: stats, scrollable transcript replay, Markdown export
  • Transcripts saved to data/transcripts/ as JSON

3. Evaluator (pages/3_Evaluator.py)

  • Load transcript(s) from session or saved files and evaluate against the competency framework
  • Scoring grounded in specific evaluation criteria from each question, not just abstract competency descriptions
  • Per-competency scores (1–5) with Bloom level demonstrated, justifications, and notable quotes
  • Radar chart (or bar chart for < 3 competencies) for single-candidate view
  • Batch comparison table, score distribution chart, and comparative ranking for multiple candidates

Project Structure

recruit/
├── app.py                          # Landing page
├── pages/
│   ├── 1_Quiz_Architect.py         # Quiz generation & transformation
│   ├── 2_Interviewer.py            # Conversational interview
│   └── 3_Evaluator.py              # Candidate scoring & comparison
├── agents/
│   ├── architect.py                # LLM calls: competency extraction, question generation, transformation
│   ├── interviewer.py              # System prompt builder + chat completion
│   └── evaluator.py                # LLM calls: candidate scoring + batch comparison
├── models/
│   └── schemas.py                  # Pydantic models (Quiz, Question, Competency, Transcript, etc.)
├── utils/
│   ├── llm.py                      # OpenAI client wrapper, retry logic, model selection
│   ├── export.py                   # Markdown export builders (comparison, transformed quiz)
│   ├── sidebar.py                  # Shared sidebar model selector
│   └── plotting.py                 # Plotly charts (radar, bar, comparison table)
├── tests/
│   ├── conftest.py                 # Shared fixtures (sample quiz, transcript, evaluation)
│   ├── test_schemas.py             # Pydantic model validation tests
│   ├── test_utils.py               # Utility function + plotting tests
│   └── test_agents.py              # Agent logic tests (mocked LLM)
├── data/
│   ├── example_jd.md               # Sample job description
│   ├── quizzes/                    # Source quiz files (MC questions)
│   ├── generated/                  # Saved quizzes from the Architect (gitignored)
│   ├── transcripts/                # Interview transcripts (gitignored)
│   └── evaluations/                # Candidate evaluation results (gitignored)
├── pyproject.toml                  # Ruff + pytest configuration
├── requirements.txt
└── README.md

Example

The data/ directory includes:

  • example_jd.md — job description
  • data/quizzes/example_nlp_engineer.json — 20 multiple-choice NLP questions (questions and correct answers only; no pre-annotated Bloom levels)

Bloom level distribution in the example quiz: The Quiz Architect assesses the Bloom level of each original question automatically when you click "Extract Competencies." After transformation, the generated questions target Level 3+.

To try it: open the Quiz Architect page, select example_nlp_engineer from the "Load Interview Questions" picker, paste or load the example job description, then click "Extract Competencies." Once competencies are extracted, click "Transform Quiz" to see the side-by-side comparison and download it as Markdown.

References

  • Anderson, L. W., & Krathwohl, D. R. (Eds.). (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives. Longman.
  • Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262–274.
  • Association of American Medical Colleges. (2024). Structured Interview Guidelines for Residency Programs.
  • Smith, P. C., & Kendall, L. M. (1963). Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Journal of Applied Psychology, 47(2), 149–155.

About

LLM-powered technical interview tool that extracts competencies from job descriptions, generates scenario-based questions grounded in Bloom's Taxonomy, conducts structured interviews, and scores candidates with behavioral rubrics (BARS).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages