A tool for generating competency-based technical assessments grounded in Bloom's Revised Taxonomy and Behaviorally Anchored Rating Scales (BARS). It uses LLMs to extract competencies from job descriptions, generate scenario-based interview questions, conduct conversational assessments, and evaluate candidates on reasoning quality.
Different question formats test different cognitive levels. Multiple-choice questions are efficient for assessing recall and comprehension (Bloom Levels 1–2), while scenario-based questions can target application, analysis, and evaluation (Levels 3–5). Structured interviews with behavioral anchors are among the most validated selection methods for predicting job performance (Schmidt & Hunter, 1998; AAMC, 2024).
This tool explores what happens when you apply Bloom's Revised Taxonomy and BARS to technical hiring: extract competencies from a job description, generate questions at specific cognitive levels, conduct a structured interview, and score responses against behavioral rubrics.
This system is grounded in Bloom's Revised Taxonomy (Anderson & Krathwohl, 2001), the standard framework for cognitive assessment design.
| Level | Name | What it tests | Hiring relevance |
|---|---|---|---|
| 1 | Remember | Recall facts | Not recommended |
| 2 | Understand | Explain concepts | Minimal value |
| 3 | Apply | Use knowledge in new situations | Minimum for technical roles |
| 4 | Analyze | Break down and examine | Recommended for mid-senior roles |
| 5 | Evaluate | Make judgments with criteria | Recommended for senior roles |
| 6 | Create | Design novel solutions | Best assessed through project work |
Structured interviews with behavioral anchors enable consistent scoring across candidates. 3–5 competencies per role is the recommended range for assessment (structured interview best practices).
Each generated question includes a scoring rubric with behavioral anchors at three levels (3, 4, 5). Instead of checking whether a candidate mentioned the right keywords, the evaluator assesses the reasoning process — how the candidate approaches the problem, whether they anticipate edge cases, and whether they connect decisions to real-world impact.
| Score | What it means |
|---|---|
| 3 | Meets expectations: gives a reasonable answer but treats it as a fixed recipe. Doesn't explore trade-offs or edge cases without prompting. |
| 4 | Exceeds expectations: proposes solutions with clear rationale, anticipates at least one complication, connects choices to downstream impact. |
| 5 | Exceptional: frames the problem as a design decision with multiple valid approaches, articulates trade-offs, and proactively addresses scale, failure modes, or evaluation. |
Bloom's Taxonomy determines what to ask (cognitive level of the question). BARS determines how to score the answer (what reasoning quality looks like at each level). Together they produce assessments that are both cognitively rigorous and consistently scorable.
flowchart TB
subgraph inputs [Inputs]
JD["Job Description (text/URL)"]
Quiz["Existing Quiz (optional)"]
end
subgraph agent1 [Module 1: Quiz Architect]
direction TB
A1["Extract Competencies from JD"]
A2["Assign Bloom Levels per Competency"]
A3["Generate Scenario Questions"]
A4["Transform Existing MC Questions"]
A1 --> A2 --> A3
Quiz --> A4
A4 --> A3
end
subgraph store1 [Quiz Store]
QDoc["Generated Quiz (JSON)"]
end
subgraph agent2 [Module 2: Conversational Interviewer]
direction TB
B1["Load Quiz as Hidden Context"]
B2["Conduct Adaptive Conversation"]
B3["Record Transcript"]
B1 --> B2 --> B3
end
subgraph store2 [Transcript Store]
TDoc["Conversation Transcript (JSON)"]
end
subgraph agent3 [Module 3: Evaluation Engine]
direction TB
C1["Score per Competency (1-5)"]
C2["Generate Candidate Summary"]
C3["Batch Compare Candidates"]
C4["Visualize Score Distributions"]
C1 --> C2 --> C3 --> C4
end
JD --> A1
A3 --> QDoc
QDoc --> B1
B3 --> TDoc
TDoc --> C1
QDoc --> C1
git clone https://github.com/ulari/recruit.git
cd recruit
pip install -r requirements.txtOpenAI API key (pick one; both are gitignored):
-
.env(loaded automatically viapython-dotenv):cp .env.example .env # Edit .env: set OPENAI_API_KEY=sk-... (your real key) -
Streamlit secrets (handy for local Streamlit and for Streamlit Community Cloud):
cp .streamlit/secrets.toml.example .streamlit/secrets.toml # Edit secrets.toml: set OPENAI_API_KEY = "sk-..."
If both are set, the environment variable from .env takes precedence.
streamlit run app.py- Paste a job description to extract 4–6 competencies with Bloom level assignments
- Generate scenario-based questions targeting Level 3–5
- Upload an existing quiz to see the side-by-side transformation: original MC question on the left, scenario version on the right, Bloom level badges on both
- Save the generated quiz to session state and disk
- Load a quiz from saved files or carry one over from Module 1
- Structured semi-interview: the AI interviewer covers all competency areas conversationally
- Interviewer neutrality protocol: no feedback, no teaching, no answer evaluation during the interview (consistent with structured interview methodology)
- Post-interview review: stats, scrollable transcript replay, Markdown export
- Transcripts saved to
data/transcripts/as JSON
- Load transcript(s) from session or saved files and evaluate against the competency framework
- Scoring grounded in specific evaluation criteria from each question, not just abstract competency descriptions
- Per-competency scores (1–5) with Bloom level demonstrated, justifications, and notable quotes
- Radar chart (or bar chart for < 3 competencies) for single-candidate view
- Batch comparison table, score distribution chart, and comparative ranking for multiple candidates
recruit/
├── app.py # Landing page
├── pages/
│ ├── 1_Quiz_Architect.py # Quiz generation & transformation
│ ├── 2_Interviewer.py # Conversational interview
│ └── 3_Evaluator.py # Candidate scoring & comparison
├── agents/
│ ├── architect.py # LLM calls: competency extraction, question generation, transformation
│ ├── interviewer.py # System prompt builder + chat completion
│ └── evaluator.py # LLM calls: candidate scoring + batch comparison
├── models/
│ └── schemas.py # Pydantic models (Quiz, Question, Competency, Transcript, etc.)
├── utils/
│ ├── llm.py # OpenAI client wrapper, retry logic, model selection
│ ├── export.py # Markdown export builders (comparison, transformed quiz)
│ ├── sidebar.py # Shared sidebar model selector
│ └── plotting.py # Plotly charts (radar, bar, comparison table)
├── tests/
│ ├── conftest.py # Shared fixtures (sample quiz, transcript, evaluation)
│ ├── test_schemas.py # Pydantic model validation tests
│ ├── test_utils.py # Utility function + plotting tests
│ └── test_agents.py # Agent logic tests (mocked LLM)
├── data/
│ ├── example_jd.md # Sample job description
│ ├── quizzes/ # Source quiz files (MC questions)
│ ├── generated/ # Saved quizzes from the Architect (gitignored)
│ ├── transcripts/ # Interview transcripts (gitignored)
│ └── evaluations/ # Candidate evaluation results (gitignored)
├── pyproject.toml # Ruff + pytest configuration
├── requirements.txt
└── README.md
The data/ directory includes:
example_jd.md— job descriptiondata/quizzes/example_nlp_engineer.json— 20 multiple-choice NLP questions (questions and correct answers only; no pre-annotated Bloom levels)
Bloom level distribution in the example quiz: The Quiz Architect assesses the Bloom level of each original question automatically when you click "Extract Competencies." After transformation, the generated questions target Level 3+.
To try it: open the Quiz Architect page, select example_nlp_engineer from the "Load Interview Questions" picker, paste or load the example job description, then click "Extract Competencies." Once competencies are extracted, click "Transform Quiz" to see the side-by-side comparison and download it as Markdown.
- Anderson, L. W., & Krathwohl, D. R. (Eds.). (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives. Longman.
- Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262–274.
- Association of American Medical Colleges. (2024). Structured Interview Guidelines for Residency Programs.
- Smith, P. C., & Kendall, L. M. (1963). Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Journal of Applied Psychology, 47(2), 149–155.