Recruit

A tool for generating competency-based technical assessments grounded in Bloom's Revised Taxonomy and Behaviorally Anchored Rating Scales (BARS). It uses LLMs to extract competencies from job descriptions, generate scenario-based interview questions, conduct conversational assessments, and evaluate candidates on reasoning quality.

Why

Different question formats test different cognitive levels. Multiple-choice questions are efficient for assessing recall and comprehension (Bloom Levels 1–2), while scenario-based questions can target application, analysis, and evaluation (Levels 3–5). Structured interviews with behavioral anchors are among the most validated selection methods for predicting job performance (Schmidt & Hunter, 1998; AAMC, 2024).

This tool explores what happens when you apply Bloom's Revised Taxonomy and BARS to technical hiring: extract competencies from a job description, generate questions at specific cognitive levels, conduct a structured interview, and score responses against behavioral rubrics.

Methodology

This system is grounded in Bloom's Revised Taxonomy (Anderson & Krathwohl, 2001), the standard framework for cognitive assessment design.

Level	Name	What it tests	Hiring relevance
1	Remember	Recall facts	Not recommended
2	Understand	Explain concepts	Minimal value
3	Apply	Use knowledge in new situations	Minimum for technical roles
4	Analyze	Break down and examine	Recommended for mid-senior roles
5	Evaluate	Make judgments with criteria	Recommended for senior roles
6	Create	Design novel solutions	Best assessed through project work

Structured interviews with behavioral anchors enable consistent scoring across candidates. 3–5 competencies per role is the recommended range for assessment (structured interview best practices).

Scoring: Behaviorally Anchored Rating Scales (BARS)

Each generated question includes a scoring rubric with behavioral anchors at three levels (3, 4, 5). Instead of checking whether a candidate mentioned the right keywords, the evaluator assesses the reasoning process — how the candidate approaches the problem, whether they anticipate edge cases, and whether they connect decisions to real-world impact.

Score	What it means
3	Meets expectations: gives a reasonable answer but treats it as a fixed recipe. Doesn't explore trade-offs or edge cases without prompting.
4	Exceeds expectations: proposes solutions with clear rationale, anticipates at least one complication, connects choices to downstream impact.
5	Exceptional: frames the problem as a design decision with multiple valid approaches, articulates trade-offs, and proactively addresses scale, failure modes, or evaluation.

Bloom's Taxonomy determines what to ask (cognitive level of the question). BARS determines how to score the answer (what reasoning quality looks like at each level). Together they produce assessments that are both cognitively rigorous and consistently scorable.

Architecture

flowchart TB
  subgraph inputs [Inputs]
    JD["Job Description (text/URL)"]
    Quiz["Existing Quiz (optional)"]
  end

  subgraph agent1 [Module 1: Quiz Architect]
    direction TB
    A1["Extract Competencies from JD"]
    A2["Assign Bloom Levels per Competency"]
    A3["Generate Scenario Questions"]
    A4["Transform Existing MC Questions"]
    A1 --> A2 --> A3
    Quiz --> A4
    A4 --> A3
  end

  subgraph store1 [Quiz Store]
    QDoc["Generated Quiz (JSON)"]
  end

  subgraph agent2 [Module 2: Conversational Interviewer]
    direction TB
    B1["Load Quiz as Hidden Context"]
    B2["Conduct Adaptive Conversation"]
    B3["Record Transcript"]
    B1 --> B2 --> B3
  end

  subgraph store2 [Transcript Store]
    TDoc["Conversation Transcript (JSON)"]
  end

  subgraph agent3 [Module 3: Evaluation Engine]
    direction TB
    C1["Score per Competency (1-5)"]
    C2["Generate Candidate Summary"]
    C3["Batch Compare Candidates"]
    C4["Visualize Score Distributions"]
    C1 --> C2 --> C3 --> C4
  end

  JD --> A1
  A3 --> QDoc
  QDoc --> B1
  B3 --> TDoc
  TDoc --> C1
  QDoc --> C1

Quick Start

git clone https://github.com/ulari/recruit.git
cd recruit
pip install -r requirements.txt

OpenAI API key (pick one; both are gitignored):

.env (loaded automatically via python-dotenv):

cp .env.example .env
# Edit .env: set OPENAI_API_KEY=sk-... (your real key)

Streamlit secrets (handy for local Streamlit and for Streamlit Community Cloud):

cp .streamlit/secrets.toml.example .streamlit/secrets.toml
# Edit secrets.toml: set OPENAI_API_KEY = "sk-..."

If both are set, the environment variable from .env takes precedence.

streamlit run app.py

Modules

1. Quiz Architect (`pages/1_Quiz_Architect.py`)

Paste a job description to extract 4–6 competencies with Bloom level assignments
Generate scenario-based questions targeting Level 3–5
Upload an existing quiz to see the side-by-side transformation: original MC question on the left, scenario version on the right, Bloom level badges on both
Save the generated quiz to session state and disk

2. Interviewer (`pages/2_Interviewer.py`)

Load a quiz from saved files or carry one over from Module 1
Structured semi-interview: the AI interviewer covers all competency areas conversationally
Interviewer neutrality protocol: no feedback, no teaching, no answer evaluation during the interview (consistent with structured interview methodology)
Post-interview review: stats, scrollable transcript replay, Markdown export
Transcripts saved to data/transcripts/ as JSON

3. Evaluator (`pages/3_Evaluator.py`)

Load transcript(s) from session or saved files and evaluate against the competency framework
Scoring grounded in specific evaluation criteria from each question, not just abstract competency descriptions
Per-competency scores (1–5) with Bloom level demonstrated, justifications, and notable quotes
Radar chart (or bar chart for < 3 competencies) for single-candidate view
Batch comparison table, score distribution chart, and comparative ranking for multiple candidates

Project Structure

recruit/
├── app.py                          # Landing page
├── pages/
│   ├── 1_Quiz_Architect.py         # Quiz generation & transformation
│   ├── 2_Interviewer.py            # Conversational interview
│   └── 3_Evaluator.py              # Candidate scoring & comparison
├── agents/
│   ├── architect.py                # LLM calls: competency extraction, question generation, transformation
│   ├── interviewer.py              # System prompt builder + chat completion
│   └── evaluator.py                # LLM calls: candidate scoring + batch comparison
├── models/
│   └── schemas.py                  # Pydantic models (Quiz, Question, Competency, Transcript, etc.)
├── utils/
│   ├── llm.py                      # OpenAI client wrapper, retry logic, model selection
│   ├── export.py                   # Markdown export builders (comparison, transformed quiz)
│   ├── sidebar.py                  # Shared sidebar model selector
│   └── plotting.py                 # Plotly charts (radar, bar, comparison table)
├── tests/
│   ├── conftest.py                 # Shared fixtures (sample quiz, transcript, evaluation)
│   ├── test_schemas.py             # Pydantic model validation tests
│   ├── test_utils.py               # Utility function + plotting tests
│   └── test_agents.py              # Agent logic tests (mocked LLM)
├── data/
│   ├── example_jd.md               # Sample job description
│   ├── quizzes/                    # Source quiz files (MC questions)
│   ├── generated/                  # Saved quizzes from the Architect (gitignored)
│   ├── transcripts/                # Interview transcripts (gitignored)
│   └── evaluations/                # Candidate evaluation results (gitignored)
├── pyproject.toml                  # Ruff + pytest configuration
├── requirements.txt
└── README.md

Example

The data/ directory includes:

example_jd.md — job description
data/quizzes/example_nlp_engineer.json — 20 multiple-choice NLP questions (questions and correct answers only; no pre-annotated Bloom levels)

Bloom level distribution in the example quiz: The Quiz Architect assesses the Bloom level of each original question automatically when you click "Extract Competencies." After transformation, the generated questions target Level 3+.

To try it: open the Quiz Architect page, select example_nlp_engineer from the "Load Interview Questions" picker, paste or load the example job description, then click "Extract Competencies." Once competencies are extracted, click "Transform Quiz" to see the side-by-side comparison and download it as Markdown.

References

Anderson, L. W., & Krathwohl, D. R. (Eds.). (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives. Longman.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262–274.
Association of American Medical Colleges. (2024). Structured Interview Guidelines for Residency Programs.
Smith, P. C., & Kendall, L. M. (1963). Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Journal of Applied Psychology, 47(2), 149–155.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recruit

Why

Methodology

Scoring: Behaviorally Anchored Rating Scales (BARS)

Architecture

Quick Start

Modules

1. Quiz Architect (`pages/1_Quiz_Architect.py`)

2. Interviewer (`pages/2_Interviewer.py`)

3. Evaluator (`pages/3_Evaluator.py`)

Project Structure

Example

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
agents		agents
data		data
models		models
pages		pages
tests		tests
utils		utils
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Recruit

Why

Methodology

Scoring: Behaviorally Anchored Rating Scales (BARS)

Architecture

Quick Start

Modules

1. Quiz Architect (pages/1_Quiz_Architect.py)

2. Interviewer (pages/2_Interviewer.py)

3. Evaluator (pages/3_Evaluator.py)

Project Structure

Example

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Quiz Architect (`pages/1_Quiz_Architect.py`)

2. Interviewer (`pages/2_Interviewer.py`)

3. Evaluator (`pages/3_Evaluator.py`)

Packages