feat(interview): Candidate AI Code Interview tool (all 5 slices) by asabaylus · Pull Request #10 · asabaylus/teamhero.cli

asabaylus · 2026-05-11T00:42:41Z

Summary

teamhero interview is a candidate-facing AI-coding take-home experience, not a grading tool. The feature treats the interview as a project the candidate works on with whatever AI tooling they prefer, in a time-box, on their own machine — and only then, as a separate step, produces an advisory audit of how they collaborated with the AI.

The framing matters. The product is the candidate experience: a fair, modern, native generative-AI coding situation. The audit is a side effect that helps the hiring manager remember and reason about the session — it is never a score, never a ranking, and never determinative.

How it works end-to-end

Hiring manager bootstraps a role. teamhero interview bootstrap (interactive wizard, or headless flags for CI) generates a starter project tailored to the role's stack, domain, and a one-sentence feature spec. The output is a real working project with a failing test, a CLAUDE.md, a GLOSSARY.md, and 2+ deep modules — sized to fit a 60/90/120-minute window. See docs/interview-classification-rationale.md for the full pedagogical and ethical framing.
Candidate receives the kit and the project. Bundled with the project is teamhero-interview-kit/ containing:
- INTERVIEW_RULES.md — what the candidate can and can't do (short version: use any AI tool you like).
- RUBRIC_OVERVIEW.md — a plain-language explanation of the 9 observation dimensions, given to the candidate up-front. Nothing is hidden from them.
- PRIVACY_RELEASE.md — explicit consent form: what is recorded, how it may and may not be used, and the appeal / deletion mechanism. The start.sh recording script will refuse to run until this is signed.
Candidate works the project. They run start.sh, which starts an asciinema recording, hands them a shell, and lets them work as they normally would. Any AI tool is fair game — Claude Code, Cursor, Copilot, Cody, plain ChatGPT-in-a-browser. The cost of any AI usage they accumulate on their personal accounts during the interview is reimbursed via a $50 gift card so that tool choice stays an honest signal of preference and not a wallet-driven compromise.
Candidate finishes. They run end.sh, which packages the recording + their commits into the repo for handoff.
Hiring manager runs the audit. teamhero interview grade --candidate "Jane" --repo <url> produces a two-tier markdown audit:
- summary.md — narrative observations across 9 dimensions, plus a mandatory manager sign-off section the manager must fill in with a categorical recommendation (Hire / Hire with notes / No hire) and reasoning ≥ 20 chars.
- audit.md — the full reasoning chain, every raw measurement, the exact bias-guard prompt used.
Every audit ships with an ADVISORY banner pinned above the content — "the audit is advisory; the candidate is a person, not a score; the audit is one factor among many." The audit is never returned to the candidate by default (GDPR Art. 15 caveat is documented for the EU/UK case).
Hiring manager reviews the cohort. teamhero interview cohort --role <slug> rolls up all candidates for a role into a rankless listing with sign-off status — no Score, no Total, no Rank columns by design.

What ships in this PR

Layer	Components
Candidate-facing	`teamhero-interview-kit/` — POSIX privacy gate, `start.sh` / `end.sh`, `INTERVIEW_RULES.md`, `RUBRIC_OVERVIEW.md`, `PRIVACY_RELEASE.md`, `.claude/` templates
Hiring-manager-facing (TUI)	Interactive `huh` wizard for `bootstrap`; phased progress display for `grade` (clone → collect-evidence → extract-measurements → observe → audit-write); glamour-rendered audit preview with pinned ADVISORY banner; shared `interviewStyles` palette
Engine (TS)	Rubric module (9 dims, 2 groups, 3 evidence modes); project validator + generator (Mode A and B); role config; 4 evidence collectors + 4 deterministic extractors; AI observer with bias guard + strict scoreless schema; audit writer with sign-off validator; rankless cohort summary
Documentation	`docs/interview-rubric.md`, `docs/interview-classification-rationale.md`, `skills/teamhero-interview/SKILL.md` (the conversational wrapper), README section 4

Ethical commitments — preserved end-to-end in code, schema, and docs

Observations, not scores. Strict OpenAI json_schema with additionalProperties: false, plus a second-layer rejectIfScored validator that throws if the parsed response contains any forbidden field.
Bias diversification, not elimination. The docs and skill never claim the AI is unbiased — only that its biases are different from any individual reviewer's.
Mandatory human-in-the-loop. Every audit requires a manager sign-off with a categorical recommendation and ≥ 20-char reasoning summary.
Interviewer-bias guard verbatim in every observation prompt (unit-tested).
Session recording URL never reaches the LLM — stored in frontmatter only.
Mandatory ADVISORY banner on summary, audit, cohort report, and at the start of every grade run.

Manual QA Instructions

A bundled smoke script walks through the candidate-experience and hiring-manager flows without spending money on OpenAI:

just build-all                                          # builds the Go TUI + TS engine
./scripts/manual-test-interview.sh                      # walks five interactive steps

The script requires a TTY (it bails on piped stdin) and uses --mode-analysis human-only for the grade step so no API key is needed.

What to verify at each step

teamhero interview bootstrap with no flags drops into the wizard. Try each rubric mode: default (skips both follow-ups), custom (reveals a text area for the prompt), default+jd (reveals a file picker; validates the path exists). Press Ctrl+C from the first screen — it must exit cleanly with no partial output.
Headless bootstrap is unchanged. The script then runs teamhero interview bootstrap --headless --role manual-test-role ... --output-dir /tmp/.... No TUI should render. Inspect the output directory:
- CLAUDE.md and GLOSSARY.md present
- 2+ deep modules in the generated source
- A failing test (the candidate's starting point)
- LOC of starter scaffold roughly 400–700
- teamhero-interview-kit/ contents copied in
teamhero interview grade with --mode-analysis human-only should show:
- The ADVISORY banner at the start
- A phased progress display: clone → collect-evidence → extract-measurements → observe → audit-write. (In human-only mode the observe phase is a near-instant no-op.)
- A glamour-rendered preview of summary.md with the ADVISORY banner pinned at the top, before any candidate name
Sign-off gating. Open the generated summary.md / audit.md. Confirm:
- ADVISORY banner is at the top of both files
- Sign-off section is present and requires a categorical recommendation
- The audit.json contains no score / weighted_total / band fields
- Session recording URL (if you passed one) appears in YAML frontmatter only, never in the body
teamhero interview cohort --role <slug> should render a rankless listing — no Score, Total, or Rank columns.

Optional deeper verification

just test-all                  # 1660 TS pass / 0 fail; full Go suite green
just tui-test -cover           # confirm Go coverage stays ≥ 85% (currently 87.7%)

Candidate-facing kit (independent of the audit flow)

To verify the candidate experience without running a full grade:

cd /tmp/manual-test-role/teamhero-interview-kit/
./start.sh                     # should refuse — PRIVACY_RELEASE.md is unsigned
# Edit PRIVACY_RELEASE.md to add a name + date under "## Signed" / "## Date"
SKIP_RECORD=1 ./start.sh       # privacy gate now passes; asciinema is skipped in test mode

References

Planning doc — docs/2026-05-09-candidate-interview-grader-plan.md (now includes a full Implementation Progress table)
Rubric — docs/interview-rubric.md
Defensibility / ethics framing — docs/interview-classification-rationale.md
Candidate rules — teamhero-interview-kit/INTERVIEW_RULES.md
Rubric (plain-language for the candidate) — teamhero-interview-kit/RUBRIC_OVERVIEW.md
Privacy release — teamhero-interview-kit/PRIVACY_RELEASE.md
Conversational wrapper — skills/teamhero-interview/SKILL.md

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Full "teamhero interview" flow: interactive bootstrap wizard or equivalent headless flags, headless review (phased progress display, glamour-rendered audit preview, mandatory ADVISORY banner), and cohort summary generation (COHORT.md).
Documentation
- Expanded Quick Start, rubric, rationale, interview kit, privacy release, and skill guidance emphasizing observation-only outputs (no numeric scores) and required human sign-off.
Scripts
- Add start/end session scripts and a manual test script to validate interview workflows and artifacts (summary/audit/cohort).

Captures the design for a candidate screening kit + `teamhero interview` namespace with absolute scoring and mandatory human-in-the-loop review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Establishes the rubric foundation for the candidate AI-collaboration interview grader (slice 1). Ships a deep module exposing 9 dimensions across two groups and three evidence modes: - 6 process dimensions: upfront-design, context-engineering, critical-evaluation, verification, course-correction, risk-awareness - 3 outcome dimensions: architectural-quality, test-pass, throughput - 4 deterministic, 2 hybrid, 3 llm-judge evidence modes Public API: RUBRIC_VERSION, getRubricVersion(), getDimensions(), getDimension(id), getEvidenceMode(id). Tests cover version stamping, dimension lookup, evidence-mode classification, group split, and required-field population. Refs: teamhero-scripts-3sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the `interview` verb to the TS CLI, mirroring the report/setup/doctor pattern: forwards all args (including --help and verb names) to the Go TUI binary. Verb-level routing (bootstrap, grade, cohort) lives in the Go TUI. Refs: teamhero-scripts-3sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the Go TUI side of the interview command. Routes `interview <verb>` to per-verb stubs (bootstrap, grade, cohort) that each print "not yet implemented" and exit non-zero. Top-level usage and per-command help routing extended to cover the new subcommand. Refs: teamhero-scripts-3sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Establishes the wire format between the future TS interview service and the Go TUI. Defines InterviewEvent as a discriminated union (starting with `progress`), plus parse/serialize functions that round-trip cleanly, emit one event per line, and reject malformed or unknown event types. Future slices add `observation`, `measurement`, `audit`, `result`, and `error` events to the union — additive by design, discriminated on type. Refs: teamhero-scripts-3sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Publishes the two foundational docs for the interview grader: - `interview-rubric.md` — formal rubric in observation framework. Lists all 9 dimensions across process/outcome groups, evidence modes (4 deterministic, 2 hybrid, 3 llm-judge), and per-dim observation vs. measurement output shape. No scoring levels anywhere. - `interview-classification-rationale.md` — methodology and ethics doc with binding top-section preamble covering: (1) observations- not-scores, (2) bias diversification (NOT elimination — never claim the AI is "non-biased"), (3) mandatory human-in-the-loop with manager sign-off, (4) GDPR Article 15 caveat for MVP candidate-no-audit-access default. Per-dimension methodology, interviewer-bias guard, and schema-level scoring-drift guard follow the preamble. Refs: teamhero-scripts-3sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…le config, kit scaffold Implements the hiring manager's bootstrap user journey end-to-end (headless mode). Deep modules: - `project-validator` (Mode A: CLAUDE.md / GLOSSARY.md / 2+ deep modules / failing test / LOC 400-700; Mode B: BRIEF.md with time-box, acceptance, deliverables sections) - `role-config` (typed schema, validator with rubric-mode-conditional requirements: custom -> non-empty customPrompt, default+jd -> existing jdPath; round-tripping persistence) - `project-generator` (injectable GeneratorClient interface; retries up to 3x on validation failure; embeds the interview-kit template files into bootstrap output) - `orchestrator.runBootstrap` glues the three: validates config, runs generation, writes role-config.json Production wiring: - `openai-generator-client.ts` calls the Responses API with strict json_schema requiring `files[].{path,content}`; prompt encodes role metadata, rubric, mode-A vs mode-B contract, and retry diagnostics - `scripts/run-interview-bootstrap.ts` CLI entry the Go TUI spawns - `tui/interview_bootstrap.go` parses headless flags (--role / --stack / --domain / --feature / --time-box / --mode-{project,analysis,rubric} / --output-dir / --jd-path / --custom-prompt / --kit-dir), validates them with the same rules as TS, and shells out to bun running the script Embedded interview-kit placeholder templates (Slice 3 enriches): - start.sh / end.sh (placeholder; Slice 3 implements privacy gate + asciinema) - INTERVIEW_RULES.md, RUBRIC_OVERVIEW.md, PRIVACY_RELEASE.md (placeholders) - .claude/settings.json (empty hooks; Slice 3 wires) - .claude/CLAUDE.md (minimal candidate-agent framing) Test coverage: - Project validator: 11 cases covering Mode A positive + each negative failure (missing files, sprawl-only, no failing tests, LOC bounds) and Mode B positive + missing/empty/incomplete brief - Role config: 12 cases covering field requirements, rubric-mode conditionals, time-box bounds, persistence round-trip, refusal to write invalid configs - Project generator: 5 cases covering Mode A success, retry-then-success, exhaustion after 3 failures, kit template copy, Mode B success - Orchestrator: 3 cases covering happy path, validation rejection without generator invocation, kit-dir merging - Go: 10 cases covering flag parsing, validation rejection of incomplete / malformed flags, runner delegation, exit-code forwarding Notes: --headless is the only Slice 2 path. Interactive `huh` wizard deferred. `assess_config.go` referenced in acceptance criteria does not exist on this branch (assess maturity work has not merged); follows the existing `doctor.go` style instead. Refs: teamhero-scripts-7qe Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ripts + templates) Completes the candidate's user journey: clone, sign, code, push. The kit lives at `teamhero-interview-kit/` in this repo and is copied verbatim into every bootstrap output by Slice 2. Privacy gate (POSIX-compatible deep module): - `lib/privacy-gate.sh` exposes `check_privacy_release_signed PATH` (function) and runs the same check when invoked as a script. Detects missing file, empty file, missing/whitespace/placeholder values under the "## Signed" and "## Date" sections. Shell entry points: - `start.sh` — calls the privacy gate, refuses on unsigned with clear instructions, verifies UserPromptSubmit + PreToolUse hooks present in .claude/settings.json, writes .interview-state/started-at, and execs asciinema rec terminal.cast. Honors SKIP_RECORD=1 + ASCIINEMA_BIN env for testability. - `end.sh` — verifies start was run, stages PRIVACY_RELEASE.md + terminal.cast + interview.log + INTERVIEW_RULES.md, and creates a single git commit. Honors SKIP_COMMIT=1 for testability. Templates filled in with real content: - `PRIVACY_RELEASE.md` — full placeholder release with REVIEW WITH LEGAL warning, no-training clause, deletion/access/appeal rights (30-day appeal window after hiring decision), bias acknowledgement, what-the-AI-sees and what-it-doesn't-see sections. - `INTERVIEW_RULES.md` — candidate-facing rules with `{{TIME_BOX}}` placeholder, before/during/finishing sections, and WSL setup instructions for Windows candidates (apt install + clone under ~/ not /mnt/c/). - `RUBRIC_OVERVIEW.md` — plain-language summary of all 9 dimensions (process × 6, outcome × 3) including the "why no scoring" framing. - `.claude/settings.json` — UserPromptSubmit + PreToolUse hooks each pipe a structured JSONL line to `interview.log` via jq. - `.claude/CLAUDE.md` — candidate-agent framing covering role, what to do with knowledge of being observed (be normal, do not coach toward rubric), and what not to do (no silent destructive ops, no external exfil). Tests (17 new): - Privacy gate (9 cases): signed, missing file, empty file, placeholder- only, signature-no-date, date-no-signature, no path argument, missing sections, whitespace-padded real values. - Kit smoke (8 cases): start refuses unsigned; start passes when signed (SKIP_RECORD); settings.json declares both hooks emitting to interview.log; end refuses without prior start; full start→end round-trip (SKIP modes); INTERVIEW_RULES mentions WSL; RUBRIC_OVERVIEW mentions all 9 dims; PRIVACY_RELEASE has REVIEW WITH LEGAL + no-training + appeal + 30-day language. Refs: teamhero-scripts-h13 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er, audit writer) Largest slice. Produces a complete per-candidate audit end-to-end: inputs → events → measurements + observations → summary.md / audit.md / audit.json / evidence/. Headless mode is wired through the Go TUI. Evidence collectors (4 input adapters): - `collectors/asciinema.ts` — parses v2 cast files; reconstructs typed commands from input events with pause-before-Enter timing; handles backspace. - `collectors/jsonl-log.ts` — parses interview.log (kit-emitted JSONL); separates prompts from tool-use; silently drops malformed lines. - `collectors/transcript.ts` — supports [HH:MM:SS] Speaker: text format (Granola/Otter/Fireflies markdown export shape) and bare Speaker: lines; skips VTT timing lines and comment lines. - `collectors/git-history.ts` — parses `git log` via an injectable runner; tolerates non-git directories by returning empty list. Deterministic extractors (4 pure functions): - `extractors/verification.ts` — counts test/typecheck runs across bun/npm/pnpm/yarn/go/pytest/jest/vitest/cargo/just, plus diff/grep/cat reads, plus test-runs-immediately-after-prompt. - `extractors/risk-awareness.ts` — pattern-matches 13 destructive command families; reports each with pause-before-Enter timing. - `extractors/test-pass.ts` — injectable TestRunner; default runner picks bun vs go from project layout. - `extractors/throughput.ts` — session start/end, elapsed, commits, time-to-first-test, avg gap between commits. AI observer (deep module with explicit ethical guards): - `INTERVIEWER_BIAS_GUARD` constant — verbatim instruction told to every observation prompt to treat interviewer commentary as situational context, not evidence about the candidate. - `OBSERVATION_RESPONSE_SCHEMA` — strict json_schema with additionalProperties: false on every level; declares ONLY dimension_id / observation / reasoning / evidence_excerpts / caveats. No score, weighted_total, raw_total, band, signal_count. - `rejectIfScored` — second-layer validator; throws if the parsed response contains any forbidden field (defense in depth: schema + post-parse check + writer renders nothing numerical). - `buildObserverPrompt` — pure function; supports default / custom / default+jd rubric modes; reads JD from disk; embeds rubric for the 5 observable dims only (deterministic dims are extracted, not observed); never includes the session_recording_url. - `humanOnlyObservations` — returns blank fillable templates for human-only analysis mode. - `OpenAIObserverClient` — production client; Responses API; gpt-5-mini default; the rejectIfScored guard runs on every response. Audit writer (pure transformation): - `renderSummary` / `renderAudit` — markdown rendering; same warning banner at top of both ("THIS AUDIT IS ADVISORY ... not a score"); YAML frontmatter with Obsidian-friendly tags + candidate + role + date + rubric_version + rubric_mode + signed_off + session_recording_url + session_platform + session_date; reasoning chain preserved in BOTH tiers (transparency). - Per-dimension layout: measurements first (for deterministic / hybrid dims), then observation + reasoning + evidence (for LLM-judge / hybrid dims). - `validateSignOff` — categorical recommendation (Hire / Hire with notes / No hire) plus manager-written reasoning >=20 chars; the TUI uses this to reject empty submissions. Grade orchestrator: - `gradeCandidate` — clones repo (injectable cloner) or uses localRepoPath; reads role-config.json from the repo; runs all 4 collectors + 4 extractors; calls AI observer (or human-only path); writes audit; copies PRIVACY_RELEASE.md, terminal.cast, interview.log, transcript, interviewer-notes into evidence/. Cleans up the temp clone on exit. - `scripts/run-interview-grade.ts` — CLI entry the Go TUI spawns. Go TUI: - `tui/interview_grade.go` — headless flag parser (positional repo or --repo, plus --candidate, --transcript, --interviewer-notes, --session-recording-url, --session-platform, --session-date, --output-dir, --local-repo-path) with validation; mandatory ADVISORY warning banner printed before spawning the TS subprocess. - `runInterview` dispatcher wires grade to runInterviewGrade. Tests (50 new): - Collectors: 10 (asciinema header+commands+backspace, interview.log split + malformed-line tolerance, transcript timestamped + bare + VTT, git history with stub runner, git failure tolerance). - Extractors: 9 (verification counts + interleaving; risk-awareness zero-detection / rm -rf with pause / force-push and reset; test-pass with injected runner including n/a case; throughput elapsed + commits + time-to-first-test + empty). - AI observer: 12 (bias guard verbatim; role metadata in input; rubric for 5 observable dims only; custom prompt mode; JD content mode; session URL NOT in prompt; interviewer notes routed to user input not instructions; schema has no score/total/band fields; rejectIfScored on score / weighted_total / clean; human-only template count and content). - Audit writer: 15 (summary frontmatter, banner, reasoning preserved, measurements rendered, sign-off section with Hire|notes|No hire, session_recording_url in frontmatter; audit banner; audit reasoning preserved; audit evidence not truncated; writeAudit produces all 4 outputs; validateSignOff rejects non-categorical; rejects empty reasoning; rejects too-short reasoning; accepts substantive Hire/notes/No-hire). - Grade orchestrator: 4 (full happy path produces all outputs + PRIVACY_RELEASE copied; missing role-config errors; human-only skips observer; session_recording_url in frontmatter NOT in prose). - Go: 8 (grade flag parsing positional / all-flags; validation rejects missing candidate / missing repo / bad platform; warning banner printed; exit code forwarded). `just test`: 1632 pass. `go test`: all green. Refs: teamhero-scripts-bt2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Completes the candidate-interview-grader MVP. Hiring manager can now roll up the cohort and orchestrate the whole flow conversationally. Cohort summary module (deep, pure): - `cohort-summary.ts`: - `loadCohort(roleDir)` reads every `<candidate>/audit.json`, returns typed records; skips malformed JSON silently. - `renderCohortSummary` produces COHORT.md as a markdown table with columns: Candidate | Interviewed | Sign-off | Recommendation | Audit. NO score, total, or rank column. Default order is alphabetical; chronological is opt-in. - Mandatory ADVISORY warning banner at the top. - Pending sign-offs marked ⏳; reviewed marked ✅. - Header line reports pending vs reviewed counts. - Friendly "no candidates yet" copy for empty roles. - `writeCohortSummary` writes the result to `<roleDir>/COHORT.md`. Audit frontmatter extension: - `AuditFrontmatter.recommendation` (optional, categorical) added so the cohort renderer can show the manager's sign-off choice without re-parsing the markdown. Production wiring: - `scripts/run-interview-cohort.ts` — CLI entry the Go TUI spawns. - `tui/interview_cohort.go` — headless flag parser (--role / --role-dir / --order) + validation + bun subprocess runner. - `runInterview` dispatcher routes cohort → runInterviewCohort. Claude skill — bounded-context wrapper: - `skills/teamhero-interview/SKILL.md` ships in-repo. Users install by copying to `~/.claude/skills/teamhero-interview/`. - Frontmatter `name: teamhero-interview` + descriptive trigger description covering bootstrap / grade / cohort / hiring queries. - Ethical framing reproduced as binding rules: observations not scores, bias diversification (not elimination), mandatory human-in-the-loop. - All 3 MVP verbs documented with example invocations. - v1.5 verb stubs (list-roles, list-candidates) acknowledged with the current workaround (ls docs/interviews/). - Cohort orchestration workflow described: read role config → ask user for transcript per candidate → grade each → run cohort verb → remind manager that each audit needs its own sign-off. - Explicit "what NOT to do" section: do not produce scores, do not bypass the privacy gate, do not feed session_recording_url to the observer, do not re-introduce score fields in code. - Skill repeats explicitly: contains no business logic; all logic lives in src/services/interview/. Tests (24 new): - Cohort summary: 11 (loadCohort missing-dir; reads each candidate; skips folders without audit.json; banner present; required columns + no Score/Total/Rank; pending vs reviewed indicators; alphabetical default; chronological opt-in; empty-cohort message; counts in header; writeCohortSummary produces file). - Skill: 9 (skill exists; frontmatter has name + description; all 3 MVP verbs documented; v1.5 stubs mentioned; ethical framing — observations-not-scores + bias-diversification + human-in-the-loop; refusal-of-numerical-scores instruction; cohort orchestration documented; no-business-logic statement present; session_recording_url feeding warning present). - Go: 4 (cohort flag parsing; validation rejects missing --role; validation rejects bad --order; runner delegation + exit code forwarding). `just build-all` ✓, `just test-all` ✓ (1652 pass, 0 fail; Go all green). Refs: teamhero-scripts-9v2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two CI failures on PR #10's test-typescript job: 1. scripts/run-tests.sh line 64 had a bash arithmetic syntax error when bun emitted multi-line "N pass" / "N fail" / "N skip" output. The grep captured all matches (e.g., "3\n0"), and \$((PASS + file_pass)) errored with "syntax error in expression". Counts silently desynced (CI 1398 pass vs local 1652). Fixed by piping to tail -1 so only the summary line's match is used. 2. Function coverage was 84.49% (under 85% threshold) in CI. Added targeted tests for the uncovered seams in the new interview module: - OpenAIObserverClient (constructor, observe() happy path, missing output_text guard, rejectIfScored defense-in-depth) - summarizeEvents non-prompt branches (tool-use, command, commit, transcript-line) via buildObserverPrompt - validateGenerated re-runs validation on an existing output dir - extractTestPass falls through to realRunner on dirs without go.mod / package.json Local coverage: 88.39% → 88.79% functions (over the 85% threshold with ~4pp margin). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Implements the three v1.5 slices for `teamhero interview`: - **Slice 1 (teamhero-scripts-92a)**: Interactive huh.Form wizard for `teamhero interview bootstrap`. Invoking with no flags on a TTY drops into a wizard that walks role identity, time-box, project mode, analysis mode, and rubric mode — with conditional follow-ups for the custom-prompt and default+jd branches. Form-state-building is isolated in pure helpers (newBootstrapWizardModel, bootstrapWizardNextState, bootstrapWizardOptionsFromModel) so the rubric-mode branching, defaults, and option round-trip are all unit-testable without rendering a TTY. The wizard hands its BootstrapOptions to the same ValidateBootstrapOptions gate the headless path uses, so there is no parallel validation logic. Headless mode (any flags supplied) is unchanged. - **Slice 2 (teamhero-scripts-upo)**: Three TUI polish concerns: (1) shared interviewStyles module centralizing the palette so future screens stay visually aligned with the report wizard; (2) glamour- rendered audit preview (`renderInterviewAuditPreview`) that pins the ADVISORY warning banner above the source content — managers read the preview before the sign-off file; (3) phased progress display (`interviewProgressModel`) consuming the existing InterviewProgressEvent JSON-lines protocol from the bun subprocess, with five canonical phases (clone → collect-evidence → extract-measurements → observe → audit-write) and graceful failure surfacing. - **Slice 3 (teamhero-scripts-rjp)**: Manual end-to-end smoke script at `scripts/manual-test-interview.sh` walking a human operator through the wizard, headless bootstrap, grade with --mode-analysis human-only (no OpenAI spend), sign-off gating, and cohort rendering. README documents the interactive wizard as the primary path; the Claude skill at `skills/teamhero-interview/SKILL.md` now recommends the wizard on a TTY and falls back to the headless flag list for agents and scripting. Test coverage: ~35 new Go tests covering wizard state-machine branching, field validators (URL-safe slug, time-box range, JD-path existence), options round-trip, dispatcher wizard-vs-headless decision, preview pinning, progress phase transitions, manual-test- script structure. Overall Go coverage: 87.7% (above 85% threshold). TS: 1660 pass, 0 fail. No behavior change to the headless path. Closes teamhero-scripts-92a, teamhero-scripts-upo, teamhero-scripts-rjp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Updates the candidate-interview-grader plan to reflect what has actually landed on PR #10. Replaces the "deferred until go-ahead" Next Steps section with an Implementation Progress table that tracks the 6 MVP slices (beads `teamhero-scripts-{3sh,7qe,h13,bt2,9v2}`) and the 3 v1.5 slices (beads `teamhero-scripts-{92a,upo,rjp}`) — all closed. Adds a note that the `tui/assess_*.go` reference files don't exist on `main` yet, so v1.5 aligned with `tui/wizard.go` + `tui/setup.go` instead. The "Still deferred" list keeps the pilot, `list-roles`/`list-candidates`, multi-model bias diversification, and periodic anonymized bias audit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-11T02:11:54Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a complete interview-review feature: CLI/TUI verbs (bootstrap, review, cohort), headless Bun scripts, TypeScript services for bootstrap and review, evidence collectors/extractors, OpenAI observer with strict schema and bias guard, audit/cohort writers with ADVISORY banner and sign-off, kit scripts/docs, and extensive tests.

Changes

Interview Reviewer

Layer / File(s)	Summary
CLI & TUI entrypoints `src/cli/index.ts`, `tui/interview.go`, `tui/_bootstrap.go`, `tui/interview_review.go`, `tui/interview_cohort.go`, `tui/main.go`	Registers `interview` subcommand, forwards help/flags, adds headless verbs and TUI dispatching with preview/progress rendering.
Role config & rubric `src/services/interview/bootstrap/role-config.ts`, `src/services/interview/shared/rubric.ts`	Adds RoleConfig types/validation, Project/Analysis/Rubric modes, and a fixed rubric catalog with evidence modes.
Bootstrap generation `src/services/interview/bootstrap/project-generator.ts`, `src/services/interview/bootstrap/openai-generator-client.ts`, `src/services/interview/bootstrap/orchestrator.ts`, `scripts/run-interview-bootstrap.ts`	Generator client (OpenAI), safe file path handling, retries, validators, runBootstrap orchestrator, and headless bootstrap script.
Evidence collectors `src/services/interview/review/collectors/asciinema.ts`, `jsonl-log.ts`, `transcript.ts`, `git-history.ts`	Parsers that normalize terminal, prompt/tool, transcript, and git history into EvidenceEvent streams.
Deterministic extractors `src/services/interview/review/extractors/*`	extractVerification, extractTestPass, extractThroughput, extractRiskAwareness compute Measurements from evidence.
AI observer & orchestration `src/services/interview/review/ai-observer.ts`, `src/services/interview/review/review-orchestrator.ts`, `scripts/run-interview-review.ts`	Builds observation-only prompts (bias guard), strict response schema, OpenAI client, and reviewCandidate orchestration: clone/read, collect, measure, observe or human-only, write audits.
Audit & cohort writers `src/services/interview/review/audit-writer.ts`, `src/services/interview/cohort/cohort-summary.ts`, `scripts/run-interview-cohort.ts`	Render/write `summary.md`/`audit.md`/`audit.json` with ADVISORY banner and sign-off, produce `COHORT.md` roll-ups.
TUI wizard & preview `tui/interview_bootstrap_form.go`, `tui/interview_bootstrap_wizard.go`, `tui/interview_preview.go`, `tui/interview_progress.go`, `tui/interview_styles.go`	Interactive `huh` bootstrap wizard, validators, glamour preview (pinned advisory), progress model, and styles.
Interview kit & scripts `teamhero-interview-kit/*`, `teamhero-interview-kit/lib/privacy-gate.sh`, `scripts/manual-test-interview.sh`, `teamhero-interview-kit/start.sh`, `end.sh`	Kit assets, Claude hooks/settings, privacy release template, start/end scripts, privacy gate, and manual smoke-test script.
Docs & skill `docs/*`, `README.md`, `skills/teamhero-interview/SKILL.md`	Adds classification rationale, rubric, reviewer plan, README quick start, and Claude skill docs enforcing observations-not-scores and human sign-off.
Tests `tests/unit/*`, `tui/_test.go`	Extensive unit and edge tests for generators, validators, collectors, extractors, observer, audit writer, cohort, TUI, and kit scripts.
Utilities / CI `scripts/run-tests.sh`, `.coderabbit.yaml`	Improved test summary parsing and CodeRabbit config for automated reviews.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as teamhero CLI
  participant TUI as Go TUI
  participant Bun as Bun scripts
  participant Svc as TS Services
  participant OpenAI as OpenAI Responses API
  participant FS as Filesystem

  CLI->>TUI: run `teamhero interview review ...`
  TUI->>Bun: invoke `scripts/run-interview-review.ts` (flags)
  Bun->>Svc: reviewCandidate(input, deps)
  Svc->>FS: clone/read repo, read evidence files
  Svc->>Svc: compute measurements (extractors)
  Svc->>OpenAI: observe(prompt with strict json_schema)
  OpenAI-->>Svc: structured observations
  Svc->>FS: write summary.md, audit.md, audit.json, evidence/
  Bun-->>TUI: return artifact paths
  TUI-->>CLI: render preview + ADVISORY banner

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

"A rabbit scrawls a checklist, bright and fleet,
Through prompts and logs and tests it nibbles neat.
ADVISORY pinned, observations not score,
Papers and audits across the floor.
🐇✨"

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/slice-1-foundation

coderabbitai

Actionable comments posted: 4

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (18)

teamhero-interview-kit/lib/privacy-gate.sh-16-16 (1)

16-16: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move set -u inside the conditional execution block to avoid mutating the caller's environment.

The global set -u at line 16 is applied whenever this file is sourced as a library, which unexpectedly enables the nounset option in any caller that doesn't already have it enabled. This violates the library pattern and can break scripts that rely on unset variable behavior.

Move the set -u declaration inside the conditional check at lines 72–75 so it only applies during direct script execution:
Suggested fix
-set -u
-
# extract_section <file> <heading> — prints the body that follows a given
 # When invoked directly (not sourced), run the check on $1 and exit with the
 # function's return code.
 if [ "${BASH_SOURCE[0]:-$0}" = "$0" ]; then
+    set -u
     check_privacy_release_signed "${1:-}"
     exit $?
 fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@teamhero-interview-kit/lib/privacy-gate.sh` at line 16, Remove the top-level
"set -u" so the library doesn't change the caller's shell options, and instead
enable nounset only inside the script's main execution guard (the conditional
that checks whether the file is run directly vs sourced); place "set -u" at the
start of that guard block (and leave it out of any exported functions or
top-level code) so nounset applies only during direct execution and not when the
file is sourced.

src/services/interview/bootstrap/project-validator.ts-21-29 (1)

21-29: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent recursive walk from following symlink cycles

walk uses statSync, which follows symlinks. A cyclic symlink inside generated project contents can cause unbounded recursion.

Proposed fix

-import { existsSync, readFileSync, readdirSync, statSync } from "node:fs";
+import { existsSync, lstatSync, readFileSync, readdirSync } from "node:fs";
@@
 	for (const entry of readdirSync(dir)) {
 		if (entry === "node_modules" || entry === ".git") continue;
 		const full = join(dir, entry);
-		const s = statSync(full);
+		const s = lstatSync(full);
+		if (s.isSymbolicLink()) continue;
 		if (s.isDirectory()) walk(full, out);
 		else out.push(full);
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/bootstrap/project-validator.ts` around lines 21 - 29,
The walk function can follow symlinks because it uses statSync, causing
potential cycles; change walk(dir: string, out: string[] = []) to skip symbolic
links (use lstatSync and if entry is a symlink, skip or optionally resolve
safely) or add a visited set of realpaths to prevent revisiting; specifically
update the logic around readdirSync + statSync to use lstatSync and either
ignore entries where lstatSync(...).isSymbolicLink() or track realpath
(fs.realpathSync) in a visited Set inside walk (or an additional parameter) and
return early when a path is already visited to prevent unbounded recursion.

src/services/interview/shared/events.ts-30-41 (1)

30-41: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add validation for step and status fields in parseInterviewEvent

The function accepts {"type":"progress"} without validating required fields step and status, allowing malformed events to pass type checking at runtime. The message and progress optional fields are also unvalidated if present.

Proposed fix

 export function parseInterviewEvent(line: string): InterviewEvent | null {
 	let value: unknown;
 	try {
 		value = JSON.parse(line);
 	} catch {
 		return null;
 	}
 	if (!value || typeof value !== "object") return null;
-	const type = (value as { type?: unknown }).type;
+	const obj = value as Record<string, unknown>;
+	const type = obj.type;
 	if (typeof type !== "string") return null;
 	if (!KNOWN_EVENT_TYPES.includes(type as InterviewEvent["type"])) return null;
-	return value as InterviewEvent;
+	if (type === "progress") {
+		const step = obj.step;
+		const status = obj.status;
+		if (typeof step !== "string") return null;
+		if (
+			status !== "start" &&
+			status !== "update" &&
+			status !== "done" &&
+			status !== "error"
+		) {
+			return null;
+		}
+		if (obj.message !== undefined && typeof obj.message !== "string") return null;
+		if (obj.progress !== undefined && typeof obj.progress !== "number") return null;
+		return obj as InterviewProgressEvent;
+	}
+	return null;
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/shared/events.ts` around lines 30 - 41,
parseInterviewEvent currently accepts {"type":"progress"} without validating
required fields; update parseInterviewEvent to, after confirming type is
"progress", check that (value as any).step exists and is a string and (value as
any).status exists and is a string and matches the allowed progress statuses
(use your existing InterviewEvent status union or a small array of allowed
status strings), and also validate optional fields: if message is present it
must be a string, and if progress is present it must be a number; reject (return
null) for any type-mismatches so that only well-formed InterviewEvent objects
(for all types, but especially "progress") are returned while still using
KNOWN_EVENT_TYPES and casting to InterviewEvent when validation passes.

src/services/interview/assess/extractors/verification.ts-58-71 (1)

58-71: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Interleaving metric ignores the advertised 30-second window.

The code only checks prompt→test adjacency, but never validates the 30s condition described in the comment and fact label context.

Proposed fix

 	let interleavedAfterPrompt = 0;
 	for (let i = 1; i < merged.length; i++) {
 		const cur = merged[i];
 		const prev = merged[i - 1];
 		if (cur.type === "command" && prev.type === "prompt") {
-			interleavedAfterPrompt += 1;
+			const deltaMs =
+				new Date(cur.timestamp).getTime() - new Date(prev.timestamp).getTime();
+			if (Number.isFinite(deltaMs) && deltaMs >= 0 && deltaMs <= 30_000) {
+				interleavedAfterPrompt += 1;
+			}
 		}
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/extractors/verification.ts` around lines 58 -
71, The interleaving count currently only checks adjacency between merged
entries but ignores the intended 30-second window; update the loop that computes
interleavedAfterPrompt (which iterates over merged built from prompts and
testRuns) to parse timestamps to numeric time and only increment
interleavedAfterPrompt when cur.type === "command" && prev.type === "prompt" &&
(new Date(cur.timestamp).getTime() - new Date(prev.timestamp).getTime()) <=
30000; ensure merged remains sorted by timestamp before this check.

src/services/interview/bootstrap/role-config.ts-121-126 (1)

121-126: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate and harden readRoleConfig instead of unchecked cast.

JSON.parse(... as RoleConfig) allows malformed shape through (or throws unwrapped parse errors), which makes downstream failures harder to diagnose.

Proposed fix

 export function readRoleConfig(dir: string): RoleConfig | null {
 	const path = join(dir, ROLE_CONFIG_FILENAME);
 	if (!existsSync(path)) return null;
-	const body = readFileSync(path, "utf8");
-	return JSON.parse(body) as RoleConfig;
+	const body = readFileSync(path, "utf8");
+	let parsed: unknown;
+	try {
+		parsed = JSON.parse(body);
+	} catch (error) {
+		throw new Error(`Invalid ${ROLE_CONFIG_FILENAME}: malformed JSON`);
+	}
+	const result = validateRoleConfig(parsed as RoleConfig);
+	if (!result.ok) {
+		throw new Error(
+			`Invalid ${ROLE_CONFIG_FILENAME}: ${result.failures.join("; ")}`,
+		);
+	}
+	return parsed as RoleConfig;
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/bootstrap/role-config.ts` around lines 121 - 126,
readRoleConfig currently does an unchecked JSON.parse and casts to RoleConfig;
wrap the read+parse in a try/catch to surface parse errors and then validate the
parsed object’s shape (check required properties and their types expected by
RoleConfig) before returning it — if validation fails return null or throw a
descriptive Error with context including ROLE_CONFIG_FILENAME and the path, so
callers don’t receive a malformed RoleConfig instance; implement this inside
readRoleConfig.

src/services/interview/assess/collectors/transcript.ts-24-37 (1)

24-37: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Transcript parser is too brittle for common speaker labels and invalid anchor timestamps.

The bare-speaker regex can skip valid lines (e.g., Speaker 1: ...), and an invalid sessionStartIso can throw on toISOString().

Proposed fix

-const BARE = /^([A-Za-z][A-Za-z .'-]+?):\s+(.+)$/;
+const BARE = /^([^:\[\]#][^:]{0,79}?):\s+(.+)$/;

 function toIsoFromHMS(hms: string, sessionStartIso: string): string {
 	const parts = hms.split(":").map(Number);
@@
-	const base = new Date(sessionStartIso).getTime();
-	return new Date(base + totalSeconds * 1000).toISOString();
+	const parsedBase = new Date(sessionStartIso).getTime();
+	const base = Number.isFinite(parsedBase)
+		? parsedBase
+		: new Date("1970-01-01T00:00:00.000Z").getTime();
+	return new Date(base + totalSeconds * 1000).toISOString();
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/collectors/transcript.ts` around lines 24 - 37,
The bare-speaker regex (BARE) is too strict (misses labels like "Speaker 1") and
toIsoFromHMS can throw if sessionStartIso or parsed HMS are invalid; update BARE
to allow digits and optional numeric suffixes (e.g., change /^([A-Za-z][A-Za-z
.'-]+?):\s+(.+)$/ to accept names with digits and leading words such as "Speaker
1"), and harden toIsoFromHMS: validate sessionStartIso by parsing new
Date(sessionStartIso) and checking for NaN, validate each HMS part with
Number.isFinite before computing totalSeconds, and if either the base date or
HMS parts are invalid return a safe fallback (e.g., the original sessionStartIso
or new Date().toISOString()) instead of calling toISOString on an invalid date;
reference TIMESTAMPED and BARE for regex fixes and toIsoFromHMS for the
parsing/validation changes.

src/services/interview/assess/extractors/test-pass.ts-20-42 (1)

20-42: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add timeout and error handling to test runner processes to prevent grader hangs.

Both spawnSync() calls lack timeout configuration and don't capture r.error, which can cause the grader to block indefinitely if a test process hangs or fails to spawn.

Proposed fix

+const TEST_RUN_TIMEOUT_MS = 120_000;
+
 const realRunner: TestRunner = (repoDir) => {
 	if (existsSync(join(repoDir, "go.mod"))) {
 		const r = spawnSync("go", ["test", "./..."], {
 			cwd: repoDir,
 			encoding: "utf8",
+			timeout: TEST_RUN_TIMEOUT_MS,
+			killSignal: "SIGKILL",
 		});
 		return {
 			passed: countMatches(r.stdout ?? "", /^ok\s+/gm),
 			failed: countMatches(r.stdout ?? "", /^FAIL\s+/gm),
-			output: `${r.stdout ?? ""}\n${r.stderr ?? ""}`,
+			output: `${r.stdout ?? ""}\n${r.stderr ?? ""}${
+				r.error ? `\n${r.error.message}` : ""
+			}`,
 		};
 	}
 	if (existsSync(join(repoDir, "package.json"))) {
 		const r = spawnSync("bun", ["test"], {
 			cwd: repoDir,
 			encoding: "utf8",
+			timeout: TEST_RUN_TIMEOUT_MS,
+			killSignal: "SIGKILL",
 		});
-		const combined = `${r.stdout ?? ""}\n${r.stderr ?? ""}`;
+		const combined = `${r.stdout ?? ""}\n${r.stderr ?? ""}${
+			r.error ? `\n${r.error.message}` : ""
+		}`;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/extractors/test-pass.ts` around lines 20 - 42,
The spawnSync calls in src/services/interview/assess/extractors/test-pass.ts
(the blocks that run "go test ./..." and "bun test") need timeout and error
handling to avoid hangs: add a timeout option (e.g. 30000 ms) and a larger
maxBuffer to both spawnSync invocations, check r.error and r.status (or
r.signal) and include r.error/message/signal in the returned output string, and
treat a timeout/spawn failure as test failures (set failed>0 or a specific flag)
by returning a non-zero failed count when r.error or r.signal is present; update
the returned objects for the functions that use the "r" result so both
passed/failed and output reflect these error/timeout conditions.

src/services/interview/assess/collectors/git-history.ts-14-18 (1)

14-18: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace execSync with execFileSync to avoid shell string construction.

Building a shell command string for execSync is unnecessary and fragile. Pass arguments directly to the git executable via execFileSync, which avoids shell interpretation entirely and is the standard Node.js best practice for spawning processes.

Proposed fix

-import { execSync } from "node:child_process";
+import { execFileSync } from "node:child_process";
@@
 const realRunner: GitRunner = (args, cwd) =>
-	execSync(`git ${args.map((a) => `'${a.replace(/'/g, "'\\''")}'`).join(" ")}`, {
+	execFileSync("git", args, {
 		cwd,
 		encoding: "utf8",
 	});

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/collectors/git-history.ts` around lines 14 -
18, The current realRunner builds a shell command string and calls execSync,
which is fragile; change realRunner to call execFileSync directly with the git
executable and the args array (e.g. execFileSync('git', args, { cwd, encoding:
'utf8' })), remove the manual quoting/escaping logic, and ensure execFileSync is
imported from child_process; keep the GitRunner signature (args, cwd) and
preserve options like cwd and encoding.

scripts/run-interview-grade.ts-7-7 (1)

7-7: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove override: true to prevent CI/shell-provided environment variables from being overwritten.

The override: true option causes dotenv to replace any existing process.env values with those from the .env file. This will clobber runtime-provided secrets and configuration (e.g., from CI systems), breaking expected behavior in non-local environments. Use the default behavior instead.
Proposed fix
-loadDotenv({ override: true });
+loadDotenv();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/run-interview-grade.ts` at line 7, Call to loadDotenv({ override:
true }) in scripts/run-interview-grade.ts is forcing .env values to overwrite
existing environment variables; remove the override option so dotenv uses
default behavior and does not clobber CI/shell-provided vars. Locate the
loadDotenv invocation in that file (the loadDotenv symbol) and change it to
loadDotenv() or remove the override property so existing process.env entries
remain intact.

src/services/interview/assess/collectors/jsonl-log.ts-28-30 (1)

28-30: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle missing interview.log as empty evidence instead of throwing.

A missing log file currently throws before parsing starts, which can fail an entire grade run for absent optional evidence.

💡 Proposed fix

 export function parseInterviewLog(path: string): InterviewLogParseResult {
-	const body = readFileSync(path, "utf8");
+	let body = "";
+	try {
+		body = readFileSync(path, "utf8");
+	} catch (err) {
+		if ((err as NodeJS.ErrnoException).code === "ENOENT") {
+			return { prompts: [], toolUses: [] };
+		}
+		throw err;
+	}
 	const lines = body.split("\n").filter((l) => l.trim().length > 0);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/collectors/jsonl-log.ts` around lines 28 - 30,
The parseInterviewLog function currently calls readFileSync and throws if the
file is missing; update parseInterviewLog to treat a missing interview.log as
empty evidence instead of failing: check for file existence (e.g., using
fs.existsSync) or wrap readFileSync in a try/catch, and if the file is absent or
unreadable return an empty InterviewLogParseResult (empty lines/entries) rather
than letting the exception propagate; keep the rest of parsing logic intact so
parseInterviewLog and its callers receive an empty result when interview.log is
not present.

scripts/run-interview-bootstrap.ts-81-93 (1)

81-93: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate timeBox and mode flag values before constructing RoleConfig.

The current parsing allows invalid values (e.g., --time-box abc, unknown --mode-*) to pass as typed config and fail later in less clear ways.

💡 Proposed fix

 function buildConfig(flags: ParsedFlags): RoleConfig | string {
@@
-	const timeBox = Number.parseInt(flags.timeBox ?? "90", 10);
+	const timeBox = Number.parseInt(flags.timeBox ?? "90", 10);
+	if (!Number.isFinite(timeBox) || timeBox < 30 || timeBox > 240) {
+		return "Invalid --time-box; expected an integer between 30 and 240";
+	}
+
+	const projectModes = new Set<ProjectMode>(["A", "B"]);
+	const analysisModes = new Set<AnalysisMode>(["ai-assisted", "human-only"]);
+	const rubricModes = new Set<RubricMode>(["default", "custom", "default+jd"]);
+
+	if (!projectModes.has(flags.modeProject as ProjectMode)) {
+		return "Invalid --mode-project; expected A or B";
+	}
+	if (!analysisModes.has(flags.modeAnalysis as AnalysisMode)) {
+		return "Invalid --mode-analysis; expected ai-assisted or human-only";
+	}
+	if (!rubricModes.has(flags.modeRubric as RubricMode)) {
+		return "Invalid --mode-rubric; expected default, custom, or default+jd";
+	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/run-interview-bootstrap.ts` around lines 81 - 93, Before constructing
RoleConfig, validate flags.timeBox and each mode flag: ensure timeBox
(flags.timeBox) parses to a positive integer and reject non-numeric or <=0
values (set default only after validation), and verify flags.modeProject,
flags.modeAnalysis, and flags.modeRubric are valid members of the corresponding
enums/types (ProjectMode, AnalysisMode, RubricMode); if any validation fails,
log a clear error and exit/throw rather than letting invalid values flow into
the RoleConfig. Perform these checks just above where timeBox is parsed and the
config object is created (referencing the timeBox variable and the RoleConfig
construction), and normalize/cast only after validation so RoleConfig gets only
safe, well-typed values.

teamhero-interview-kit/start.sh-57-59 (1)

57-59: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Write .interview-state/started-at only after recorder preflight passes.

Right now, a missing asciinema still leaves a start marker behind. That creates a false “session started” state even though recording never began.

Suggested reorder

-# Indicate we passed the gate so end.sh can verify the kit was started.
-mkdir -p "$REPO_ROOT/.interview-state"
-date -u +%FT%TZ > "$REPO_ROOT/.interview-state/started-at"
-
 if [ "${SKIP_RECORD:-0}" = "1" ]; then
+    mkdir -p "$REPO_ROOT/.interview-state"
+    date -u +%FT%TZ > "$REPO_ROOT/.interview-state/started-at"
     echo "✓ Privacy gate passed. SKIP_RECORD=1 — not starting asciinema."
     exit 0
 fi
@@
 if ! command -v "$ASCIINEMA_BIN" >/dev/null 2>&1; then
@@
     exit 1
 fi
 
+# Indicate we passed all prerequisites and are starting recording.
+mkdir -p "$REPO_ROOT/.interview-state"
+date -u +%FT%TZ > "$REPO_ROOT/.interview-state/started-at"
+
 CAST_PATH="$REPO_ROOT/terminal.cast"

Also applies to: 66-79

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@teamhero-interview-kit/start.sh` around lines 57 - 59, The start marker file
".interview-state/started-at" is written before the recorder preflight
completes, causing false "session started" states if asciinema or recorder setup
fails; move the mkdir -p "$REPO_ROOT/.interview-state" and the date -u ... >
"$REPO_ROOT/.interview-state/started-at" lines out of the current top-level
start flow and into the success branch that runs after the recorder
preflight/asciinema checks complete (i.e., only execute these lines when the
script confirms recorder is present and recording has been started), and apply
the same reorder for the duplicate block around lines 66-79 so the marker is
only created on successful recorder initialization.

src/services/interview/assess/collectors/asciinema.ts-68-85 (1)

68-85: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Parse input chunks character-by-character to avoid missed command submits.

evt.data may contain multi-character chunks (e.g., pasted text or cmd\r in one event). The current branch logic treats the whole chunk as one token, which can fail to emit commands correctly.

Suggested fix

-		const data = evt.data;
-		if (data === "\r" || data === "\n") {
-			if (buffer.trim().length > 0) {
-				commands.push({
-					type: "command",
-					timestamp: isoFromUnix(baseEpoch + evt.delta),
-					source: "terminal.cast",
-					command: buffer,
-					pauseSecondsBeforeEnter: Math.max(0, evt.delta - lastKeyDelta),
-				});
-			}
-			buffer = "";
-		} else if (data === "�" || data === "\b") {
-			// Backspace
-			buffer = buffer.slice(0, -1);
-			lastKeyDelta = evt.delta;
-		} else if (data.charCodeAt(0) >= 32 || data === "\t") {
-			buffer += data;
-			lastKeyDelta = evt.delta;
-		} else {
-			// Control codes: arrows, escape sequences, etc. Skip silently.
-		}
+		for (const ch of evt.data) {
+			if (ch === "\r" || ch === "\n") {
+				if (buffer.trim().length > 0) {
+					commands.push({
+						type: "command",
+						timestamp: isoFromUnix(baseEpoch + evt.delta),
+						source: "terminal.cast",
+						command: buffer,
+						pauseSecondsBeforeEnter: Math.max(0, evt.delta - lastKeyDelta),
+					});
+				}
+				buffer = "";
+				continue;
+			}
+			if (ch === "\u007f" || ch === "\b") {
+				buffer = buffer.slice(0, -1);
+				lastKeyDelta = evt.delta;
+				continue;
+			}
+			if (ch === "\t" || ch.charCodeAt(0) >= 32) {
+				buffer += ch;
+				lastKeyDelta = evt.delta;
+			}
+		}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/collectors/asciinema.ts` around lines 68 - 85,
The event handler currently treats evt.data as a single token, which misses
command boundaries when evt.data contains multiple characters (pastes or
"cmd\r"); update the processing block that uses data, buffer, commands,
isoFromUnix, lastKeyDelta and evt.delta to iterate over each character in
evt.data (e.g., for (const ch of data) { ... }) and run the existing branches
per-character (newline -> push command with timestamp from isoFromUnix(baseEpoch
+ evt.delta), backspace -> slice buffer, printable/tab -> append to buffer and
set lastKeyDelta = evt.delta), ensuring buffer resets after a submitted command
so multi-char chunks are parsed correctly.

src/services/interview/bootstrap/orchestrator.ts-36-49 (1)

36-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap generation/persist steps to preserve the RunBootstrapResult contract.

generateProject(...) and writeRoleConfig(...) can throw; right now that escapes as an unhandled rejection instead of returning a structured failure. This makes bootstrap failure handling brittle at the CLI boundary.

Proposed fix

-	const generation = await generateProject(config, options.client, {
-		kitTemplateDir: options.kitTemplateDir,
-		maxAttempts: options.maxAttempts,
-	});
+	let generation;
+	try {
+		generation = await generateProject(config, options.client, {
+			kitTemplateDir: options.kitTemplateDir,
+			maxAttempts: options.maxAttempts,
+		});
+	} catch (error) {
+		return {
+			ok: false,
+			attempts: 0,
+			failures: [
+				error instanceof Error ? error.message : String(error),
+			],
+		};
+	}
 	if (!generation.ok) {
 		return {
 			ok: false,
 			attempts: generation.attempts,
 			failures: generation.failures,
 		};
 	}
-
-	writeRoleConfig(config.outputDir, config);
+	try {
+		writeRoleConfig(config.outputDir, config);
+	} catch (error) {
+		return {
+			ok: false,
+			attempts: generation.attempts,
+			failures: [
+				error instanceof Error ? error.message : String(error),
+			],
+		};
+	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/bootstrap/orchestrator.ts` around lines 36 - 49, The
code calls generateProject(...) and writeRoleConfig(...) which can throw, so
wrap the generation and persistence steps in a try/catch around the calls to
generateProject and writeRoleConfig inside the function that returns
RunBootstrapResult; on any thrown error return a RunBootstrapResult-shaped
failure (ok: false) including generation.attempts/failures if available or
default values and include the caught error message/context in the failures
array or an error field; ensure the function still returns the existing
successful shape when generation.ok is true and only writes role config inside
the try block to avoid unhandled rejections (refer to generateProject,
writeRoleConfig, and the RunBootstrapResult contract).

tui/interview_bootstrap_wizard.go-249-251 (1)

249-251: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Gate wizard launch on interactive TTY before entering interactive flow.

The dispatcher comment says wizard mode is for TTY callers, but the current condition only checks len(args) == 0. In non-interactive contexts this can trigger unexpected wizard execution instead of deterministic headless behavior.

Proposed fix

 	// No flags at all → interactive wizard.
 	if len(args) == 0 {
+		if fi, err := os.Stdin.Stat(); err != nil || (fi.Mode()&os.ModeCharDevice) == 0 {
+			fmt.Fprintln(stderr, "no interactive TTY detected; pass headless flags")
+			return 1
+		}
 		res, err := launcher.Launch()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tui/interview_bootstrap_wizard.go` around lines 249 - 251, The wizard launch
is currently gated only by len(args) == 0 which can run the interactive
launcher.Launch() in non-interactive contexts; change the condition to require
an interactive TTY as well (e.g., check that stdin is a terminal via
os.Stdin/terminal/term IsTerminal equivalents) so that the code only calls
launcher.Launch() when len(args) == 0 AND stdin is a TTY; otherwise proceed with
the deterministic headless path. Ensure you update the condition around
launcher.Launch() and keep the existing dispatcher comment consistent.

src/services/interview/cohort/cohort-summary.ts-39-43 (1)

39-43: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate audit.json frontmatter shape before adding records

On Line 39, JSON.parse is cast directly to { frontmatter: AuditFrontmatter }. A parseable but malformed file (e.g., missing candidate or signed_off) will be pushed and can fail later in renderRow (Line 53+).

Suggested fix

-			const parsed = JSON.parse(body) as { frontmatter: AuditFrontmatter };
-			records.push({
-				frontmatter: parsed.frontmatter,
-				summaryPath: join(entry, "summary.md"),
-			});
+			const parsed = JSON.parse(body) as {
+				frontmatter?: Partial<AuditFrontmatter>;
+			};
+			const fm = parsed.frontmatter;
+			if (
+				!fm ||
+				typeof fm.candidate !== "string" ||
+				typeof fm.date !== "string" ||
+				typeof fm.signed_off !== "boolean"
+			) {
+				continue;
+			}
+			records.push({
+				frontmatter: fm as AuditFrontmatter,
+				summaryPath: join(entry, "summary.md"),
+			});

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/cohort/cohort-summary.ts` around lines 39 - 43, The
code currently casts JSON.parse(body) to { frontmatter: AuditFrontmatter } and
unconditionally pushes it into records; instead validate the parsed object has a
frontmatter and required AuditFrontmatter fields (e.g., candidate, signed_off,
etc.) before pushing to records to avoid later failures in renderRow; locate the
parsing block around parsed = JSON.parse(body) and replace the blind cast with
runtime checks that parsed?.frontmatter exists and that the required keys are
present (or skip/log/throw for malformed audit.json), then push only when valid
and keep summaryPath as before.

src/services/interview/assess/audit-writer.ts-53-73 (1)

53-73: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Quote/escape YAML scalar frontmatter values

On Lines 55–73, frontmatter fields are emitted as raw strings. Values containing :, #, newlines, or quotes can corrupt parsing or inject unintended keys.

Suggested fix

 function yaml(value: AuditFrontmatter): string {
+	const q = (s: string): string => JSON.stringify(s);
 	const lines: string[] = ["---"];
-	lines.push(`tags: [${value.tags.join(", ")}]`);
-	lines.push(`candidate: ${value.candidate}`);
-	lines.push(`role: ${value.role}`);
-	lines.push(`date: ${value.date}`);
-	lines.push(`rubric_version: ${value.rubric_version}`);
-	lines.push(`rubric_mode: ${value.rubric_mode}`);
+	lines.push(`tags: [${value.tags.map(q).join(", ")}]`);
+	lines.push(`candidate: ${q(value.candidate)}`);
+	lines.push(`role: ${q(value.role)}`);
+	lines.push(`date: ${q(value.date)}`);
+	lines.push(`rubric_version: ${q(value.rubric_version)}`);
+	lines.push(`rubric_mode: ${q(value.rubric_mode)}`);
 	lines.push(`signed_off: ${value.signed_off}`);
 	if (value.recommendation !== undefined) {
-		lines.push(`recommendation: ${value.recommendation}`);
+		lines.push(`recommendation: ${q(value.recommendation)}`);
 	}
 	if (value.session_recording_url !== undefined) {
-		lines.push(`session_recording_url: ${value.session_recording_url}`);
+		lines.push(`session_recording_url: ${q(value.session_recording_url)}`);
 	}
 	if (value.session_platform !== undefined) {
-		lines.push(`session_platform: ${value.session_platform}`);
+		lines.push(`session_platform: ${q(value.session_platform)}`);
 	}
 	if (value.session_date !== undefined) {
-		lines.push(`session_date: ${value.session_date}`);
+		lines.push(`session_date: ${q(value.session_date)}`);
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/audit-writer.ts` around lines 53 - 73, The
yaml(value: AuditFrontmatter) function emits frontmatter scalars unescaped which
can break YAML when values contain characters like :, #, quotes, or newlines;
update it to sanitize/quote every scalar field (tags can remain array syntax but
each tag must be quoted) by adding a small helper (e.g. escapeYamlScalar) and
use it for candidate, role, date, rubric_version, rubric_mode, signed_off,
recommendation, session_recording_url, session_platform, and session_date; the
helper should wrap strings in double quotes and escape internal double quotes
and newlines (or alternatively call a stable YAML serializer) so the generated
frontmatter is always safe to parse.

src/services/interview/assess/grade-orchestrator.ts-194-209 (1)

194-209: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

candidate_id can collide and overwrite same-day runs

On Line 194, candidate_id is date-granular (<slug>-YYYY-MM-DD). Re-grading the same candidate on the same day will target the same default output directory and overwrite prior artifacts.

Suggested fix

+function nowIdSuffix(): string {
+	return new Date().toISOString().replace(/[:.]/g, "-");
+}
...
-			candidate_id: `${slug(input.candidateName)}-${todayIso()}`,
+			candidate_id: `${slug(input.candidateName)}-${nowIdSuffix()}`,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/grade-orchestrator.ts` around lines 194 - 209,
The current candidate_id uses date granularity
(slug(input.candidateName)-todayIso()) which causes same-day runs to collide;
update the candidate_id generation in the result object to append a
higher-resolution unique suffix (e.g., timestamp or a UUID/nanoid) so it becomes
slug(input.candidateName)-todayIso()-<unique>, and keep the outputDir fallback
logic (the outputDir variable that uses input.outputDir ?? join(...,
result.candidate_id)) unchanged so that default directories are distinct per
run; locate the candidate_id assignment and modify it to include the unique
suffix (or call a helper like Date.now() or uuid()) so re-grades on the same day
do not overwrite prior artifacts.

🟡 Minor comments (15)

src/cli/index.ts-192-206 (1)

192-206: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Block interview in the report reserved-verb guard too.

After adding the top-level interview verb here, the misuse guard in report (Line 157) still only blocks doctor/setup. teamhero report interview now slips through as report args instead of producing the helpful redirect error.

Suggested fix

-			const subcommands = ["doctor", "setup"];
+			const subcommands = ["doctor", "setup", "interview"];

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cli/index.ts` around lines 192 - 206, The new top-level command declared
via program.command("interview") is not included in the reserved-verb check
inside the report misuse guard, so "teamhero report interview" bypasses the
redirect; update the report command's misuse guard to include "interview"
alongside "doctor" and "setup" (i.e., add "interview" to the reserved verbs
list/check used by the report handler) so calls matching "interview" are
redirected the same way; locate the guard logic in the report command and adjust
its reservation check to include the "interview" verb (the new interview command
is created with program.command("interview") and launches spawnTui).

tests/unit/services/interview/bootstrap/project-validator.spec.ts-59-61 (1)

59-61: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove test-file line adjustment from fixture LOC calculation

The fixture includes 5 test-file lines in the currentLoc estimate, but the validator only counts source files (via the !TEST_NAME_PATTERN filter). This causes the padding calculation to undershoot by 5 lines.

-	const currentLoc = o.deepModuleCount * 100 + (o.withFailingTests ? 5 : 0);
+	const currentLoc = o.deepModuleCount * 100;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/services/interview/bootstrap/project-validator.spec.ts` around
lines 59 - 61, The test fixture currently adds 5 lines for test files to the LOC
estimate (in the currentLoc calculation using o.deepModuleCount * 100 +
(o.withFailingTests ? 5 : 0)), but the validator filters out test files via
TEST_NAME_PATTERN and only counts source files, causing an undershoot; remove
the test-file adjustment so currentLoc is computed from o.deepModuleCount only
(e.g., drop the +(o.withFailingTests ? 5 : 0)) so padding/remaining calculations
align with the validator's source-file counting.

tui/interview_progress.go-87-92 (1)

87-92: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Set phase to running on update events to ensure correct visual state

When an update event arrives before a start event, the phase incorrectly displays as pending (○ glyph) despite progress being reported. Update the "update" case to transition from pending to running.

Proposed fix

 	case "update":
+		if m.status[evt.Step] == interviewPhasePending {
+			m.status[evt.Step] = interviewPhaseRunning
+		}
 		// keep running, just record message for tooltip-style display
 		if evt.Message != "" {
 			m.messages[evt.Step] = evt.Message
 		}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tui/interview_progress.go` around lines 87 - 92, The "update" event handler
currently only records tooltip messages (m.messages[evt.Step]) but doesn't
change the visual phase; modify the "update" case so that when an update arrives
it sets m.phase to PhaseRunning (or at least transitions from PhasePending to
PhaseRunning) so progress is shown as running; keep the existing message
recording logic for evt.Step and evt.Message and ensure you reference the
"update" case, m.phase, and the PhasePending/PhaseRunning constants when making
the change.

src/services/interview/assess/extractors/throughput.ts-13-13 (1)

13-13: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Test-run detection is narrower than the rest of the assessment pipeline.

TEST_RUN currently misses common runners (e.g., jest, vitest, cargo test, just test), so “Time to first test run” can be silently omitted for valid sessions.

Proposed fix

-const TEST_RUN = /^\s*(bun|npm|yarn|pnpm)\s+(run\s+)?test\b|^\s*go\s+test\b|^\s*pytest\b/;
+const TEST_RUN =
+	/^\s*((bun|npm|yarn|pnpm)\s+(run\s+)?test|go\s+test|pytest|jest|vitest|cargo\s+test|just\s+test)\b/;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/extractors/throughput.ts` at line 13,
TEST_RUN's regex is too narrow and misses many common test runners; update the
TEST_RUN constant to include additional runners such as jest, vitest, mocha,
ava, tap, cypress, playwright, cargo (cargo test), and just (just test) while
preserving existing matches for package-manager-prefixed invocations
(bun|npm|yarn|pnpm with optional "run"). Modify the regex used in
src/services/interview/assess/extractors/throughput.ts (constant TEST_RUN) to
match both direct runner commands and package-manager-prefixed forms, and
consider making it case-insensitive or allowing optional whitespace so commands
like "npm run test", "yarn jest", "cargo test", and "just test" are all
detected.

teamhero-interview-kit/INTERVIEW_RULES.md-41-43 (1)

41-43: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks.

These fences trigger markdownlint MD040; adding bash keeps docs lint-clean and consistent.

♻️ Proposed fix

-   ```
+   ```bash
    git push origin HEAD
    ```
...
-```
+```bash
 brew install asciinema

...
- +bash
sudo apt install asciinema # or your distro's package manager

...
-   ```
+   ```bash
   sudo apt update
   sudo apt install asciinema git
   ```

Also applies to: 50-52, 56-58, 68-71

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@teamhero-interview-kit/INTERVIEW_RULES.md` around lines 41 - 43, Update the
Markdown fenced code blocks in INTERVIEW_RULES.md to include the bash language
identifier: change ``` to ```bash for the snippets containing "git push origin
HEAD", "brew install asciinema", "sudo apt install asciinema", and the block
with "sudo apt update / sudo apt install asciinema git" (also apply to the other
occurrences you noted at lines 50-52, 56-58, 68-71) so the file passes
markdownlint MD040 and renders correctly.

skills/teamhero-interview/SKILL.md-50-52 (1)

50-52: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Specify language for CLI code fences.

Adding bash to these fenced blocks resolves markdownlint MD040 warnings.
♻️ Proposed fix
-```
+```bash
 teamhero interview bootstrap
...
- +bash
teamhero interview bootstrap --headless
--role --stack --domain --feature ""
--time-box
--mode-project A|B
--mode-analysis ai-assisted|human-only
--mode-rubric default|custom|default+jd
--output-dir
[--jd-path ] [--custom-prompt ""] [--role-title "<title>"]
...
-```
+```bash
teamhero interview grade --candidate "Jane Doe" --repo <url> \
  [--transcript <file>] [--interviewer-notes <file>] \
  [--session-recording-url <url>] [--session-platform zoom|teams|meet|other] \
  [--session-date YYYY-MM-DD] [--output-dir <path>]
...
- +bash
teamhero interview cohort --role [--order alphabetical|chronological]
Also applies to: 63-72, 81-86, 99-101
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/teamhero-interview/SKILL.md` around lines 50 - 52, Add the language
tag "bash" to each CLI code fence in SKILL.md so the fenced blocks containing
commands like "teamhero interview bootstrap", "teamhero interview bootstrap
--headless ...", "teamhero interview grade --candidate ...", and "teamhero
interview cohort --role ..." start with ```bash instead of ``` to satisfy
markdownlint MD040; update every similar CLI fence (including the other
occurrences mentioned) consistently.

tui/interview_bootstrap_form.go-261-267 (1)

261-267: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Treat whitespace-only input as empty in nonEmpty validation.

Inputs like " " currently pass validation and can propagate invalid wizard values.

💡 Proposed fix

 import (
 	"errors"
 	"fmt"
+	"strings"
@@
 func nonEmpty(field string) func(string) error {
 	return func(s string) error {
-		if s == "" {
+		if strings.TrimSpace(s) == "" {
 			return fmt.Errorf("%s is required", field)
 		}
 		return nil
 	}
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tui/interview_bootstrap_form.go` around lines 261 - 267, The nonEmpty
validator currently treats whitespace-only strings as valid; update the nonEmpty
function (nonEmpty(field string) func(string) error) to trim whitespace (e.g.,
strings.TrimSpace) from the input before checking emptiness, returning
fmt.Errorf("%s is required", field) when the trimmed value is empty; add the
strings import if missing so the validator rejects inputs like "   ".

docs/2026-05-09-candidate-interview-grader-plan.md-273-273 (1)

273-273: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks.

These unlabeled fences trigger MD040 and reduce syntax highlighting/readability in rendered docs.
Suggested fix pattern
-```
+```text
 ...
-```
+```
Use bash, markdown, typescript, or text as appropriate per block.
Also applies to: 296-296, 528-528, 541-541, 672-672, 681-681, 694-694
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/2026-05-09-candidate-interview-grader-plan.md` at line 273, Several
fenced code blocks in docs/2026-05-09-candidate-interview-grader-plan.md are
unlabeled and trigger MD040; edit each fenced block (lines referenced in the
review: 273, 296, 528, 541, 672, 681, 694) and add an appropriate language
identifier after the opening backticks (e.g., ```bash, ```markdown,
```typescript, or ```text) to enable syntax highlighting—scan the surrounding
content to choose the correct language for each block and replace each opening
"```" with "```<language>" while leaving the closing "```" unchanged.

docs/2026-05-09-candidate-interview-grader-plan.md-533-535 (1)

533-535: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The list-roles/list-candidates status is internally inconsistent.

Line 533-535 labels these verbs as v1.5, while Line 770+ says they are still deferred. Please align one source of truth so readers don’t misinterpret shipped scope.

Also applies to: 770-775

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/2026-05-09-candidate-interview-grader-plan.md` around lines 533 - 535,
The document shows inconsistent status for the commands teamhero interview
list-roles and teamhero interview list-candidates (they're marked v1.5 in one
place but listed as deferred elsewhere); pick the correct status and make both
references consistent: update the v1.5 label if they are still deferred or
move/remove them from the deferred section if they are shipped, editing the
occurrences that mention teamhero interview list-roles and teamhero interview
list-candidates so the "shipped" label and the deferred list (the later block
that currently lists them) match exactly.

scripts/manual-test-interview.sh-62-63 (1)

62-63: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t silently swallow step failures in a smoke script.

Using || true at these execution points can hide real regressions and make the run look successful when core flows failed. Keep the script interactive, but surface non-zero exits explicitly.

Suggested adjustment

+run_step() {
+  if ! "$@"; then
+    local code=$?
+    echo "⚠ Command failed with exit code $code: $*" >&2
+    echo "   Continue only if this failure was expected." >&2
+  fi
+}
+
-"$BIN" interview bootstrap || true
+run_step "$BIN" interview bootstrap
 ...
-"$BIN" interview bootstrap \
+run_step "$BIN" interview bootstrap \
   --headless --no-confirm \
   --role manual-test-role --stack TypeScript --domain Manual \
   --feature "Verify the headless path still works" \
   --mode-project A --mode-analysis human-only --mode-rubric default \
-  --output-dir "$TMPDIR/manual-test-role" || true
+  --output-dir "$TMPDIR/manual-test-role"
 ...
-"$BIN" interview grade \
+run_step "$BIN" interview grade \
   --candidate "Manual Test Candidate" \
   --repo "https://example.com/fake-repo" \
-  --output-dir "$TMPDIR/grade-output" || true
+  --output-dir "$TMPDIR/grade-output"
 ...
-"$BIN" interview cohort --role manual-test-role || true
+run_step "$BIN" interview cohort --role manual-test-role

Also applies to: 68-73, 83-87, 99-99

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/manual-test-interview.sh` around lines 62 - 63, The script is
swallowing errors by appending "|| true" to key steps (e.g. the "$BIN interview
bootstrap" invocation and the other occurrences at the indicated ranges); remove
those "|| true" suffixes and instead check the command exit status: after
invoking "$BIN interview bootstrap" (and the other similar commands), capture
the exit code and if non-zero print a clear error message and either exit the
script or prompt the user to continue (interactive mode) so failures are
surfaced explicitly; look for occurrences of "$BIN" invocations with "|| true"
and replace them with explicit exit-code handling or an interactive confirmation
flow.

tui/interview_bootstrap_wizard.go-262-267 (1)

262-267: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard against nil Options before validation/run.

If a launcher returns Confirmed=true with Options=nil, this path can panic in validation or runner invocation. Add a defensive check and fail cleanly.

Proposed fix

 		if !res.Confirmed {
 			return 0
 		}
 		opts := res.Options
+		if opts == nil {
+			fmt.Fprintln(stderr, "wizard returned no options")
+			return 1
+		}
 		if msg := ValidateBootstrapOptions(opts); msg != "" {
 			fmt.Fprintln(stderr, msg)
 			return 1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tui/interview_bootstrap_wizard.go` around lines 262 - 267, The code assumes
res.Options is non-nil before calling ValidateBootstrapOptions and runner.Run;
add a nil check for opts (res.Options) right after assigning opts := res.Options
and before calling ValidateBootstrapOptions: if opts == nil {
fmt.Fprintln(stderr, "no options provided"); return 1 }, so the function fails
cleanly instead of panicking; keep using stderr/stdout and still call
ValidateBootstrapOptions and runner.Run only when opts is non-nil.

tui/interview_grade.go-32-39 (1)

32-39: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle missing flag values before assignment in ParseGradeFlags.

When a flag is followed by another flag token, the parser currently treats that token as a value and returns a misleading later error. Add a guard so the parser returns the correct “requires a value” error immediately.

Proposed fix

 			if i+1 >= len(args) {
 				return nil, fmt.Sprintf("flag %s requires a value", a)
 			}
 			val := args[i+1]
+			if strings.HasPrefix(val, "--") {
+				return nil, fmt.Sprintf("flag %s requires a value", a)
+			}
 			switch a {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tui/interview_grade.go` around lines 32 - 39, In ParseGradeFlags: the current
logic only checks bounds for args[i+1] but can treat a subsequent flag token as
a value; update the guard inside the case handling ("--repo", "--candidate",
"--transcript", "--interviewer-notes", "--session-recording-url",
"--session-platform", "--session-date", "--output-dir", "--local-repo-path") to
not only ensure i+1 < len(args) but also verify args[i+1] is not a flag (e.g.,
starts with "-"); if it is a flag, return the same "flag %s requires a value"
error immediately instead of assigning val and letting later code fail. Locate
this logic in ParseGradeFlags and add the extra check before setting val :=
args[i+1].

tui/interview_cohort.go-24-37 (1)

24-37: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject flag-like tokens as missing values in ParseCohortFlags.

ParseCohortFlags currently accepts another flag token as a value (for example, --role --order ...) and then fails later with a misleading error. Validate the consumed value before assigning it.

Proposed fix

 		case "--role", "--role-dir", "--order":
 			if i+1 >= len(args) {
 				return nil, fmt.Sprintf("flag %s requires a value", a)
 			}
 			val := args[i+1]
+			if strings.HasPrefix(val, "--") {
+				return nil, fmt.Sprintf("flag %s requires a value", a)
+			}
 			switch a {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tui/interview_cohort.go` around lines 24 - 37, In ParseCohortFlags, the code
currently assigns args[i+1] to opts.Role/RoleDir/Order even if that token is
another flag (e.g. "--role --order"), causing confusing errors later; fix it by
validating val before assignment: after val := args[i+1], check if
strings.HasPrefix(val, "-") (or specifically "--") and if so return the same
missing-value error (e.g. fmt.Sprintf("flag %s requires a value", a)); otherwise
proceed to set opts.Role, opts.RoleDir or opts.Order and increment i. Ensure to
import strings if needed.

src/services/interview/cohort/cohort-summary.ts-42-42 (1)

42-42: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use POSIX path joining for markdown links

On Line 42, join(entry, "summary.md") may produce \ on Windows, which can render as broken/awkward links in markdown.

Suggested fix

-import { join } from "node:path";
+import { join, posix } from "node:path";
...
-				summaryPath: join(entry, "summary.md"),
+				summaryPath: posix.join(entry, "summary.md"),

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/cohort/cohort-summary.ts` at line 42, The summaryPath
construction uses path.join(entry, "summary.md") which will produce backslashes
on Windows and break markdown links; change it to use POSIX joining instead
(e.g., use path.posix.join or equivalent) when building summaryPath in
cohort-summary.ts so the resulting link always contains forward slashes; update
imports/use of the path module accordingly and ensure the variable name
summaryPath still receives the new POSIX-joined value (reference symbols:
summaryPath, join, entry).

tui/interview_bootstrap.go-127-129 (1)

127-129: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate that --jd-path actually exists in headless mode

Lines 127–129 only require a non-empty value. A typo/path miss will pass validation and later run without JD context.

Suggested fix

 	if opts.ModeRubric == "default+jd" && strings.TrimSpace(opts.JDPath) == "" {
 		return "--mode-rubric 'default+jd' requires --jd-path"
 	}
+	if opts.ModeRubric == "default+jd" {
+		info, err := os.Stat(opts.JDPath)
+		if err != nil || info.IsDir() {
+			return "--jd-path must point to an existing file"
+		}
+	}
 	return ""

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tui/interview_bootstrap.go` around lines 127 - 129, The validation in the
condition checking opts.ModeRubric and opts.JDPath only ensures that --jd-path
is not empty, but it should also verify that the specified path actually exists
to prevent runtime errors in headless mode. Update the code to check if the file
at opts.JDPath exists using appropriate file existence checking methods (e.g.,
os.Stat in Go) and return an error if the file does not exist alongside the
current validation in the logic handling the mode-checking.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c9c03249-2c4d-43c6-ac4c-46fe9bc90861

📥 Commits

Reviewing files that changed from the base of the PR and between aba1227 and 106d3ac.

📒 Files selected for processing (74)

README.md
docs/2026-05-09-candidate-interview-grader-plan.md
docs/interview-classification-rationale.md
docs/interview-rubric.md
scripts/manual-test-interview.sh
scripts/run-interview-bootstrap.ts
scripts/run-interview-cohort.ts
scripts/run-interview-grade.ts
scripts/run-tests.sh
skills/teamhero-interview/SKILL.md
src/cli/index.ts
src/services/interview/assess/ai-observer.ts
src/services/interview/assess/audit-writer.ts
src/services/interview/assess/collectors/asciinema.ts
src/services/interview/assess/collectors/git-history.ts
src/services/interview/assess/collectors/jsonl-log.ts
src/services/interview/assess/collectors/transcript.ts
src/services/interview/assess/extractors/risk-awareness.ts
src/services/interview/assess/extractors/test-pass.ts
src/services/interview/assess/extractors/throughput.ts
src/services/interview/assess/extractors/verification.ts
src/services/interview/assess/grade-orchestrator.ts
src/services/interview/assess/types.ts
src/services/interview/bootstrap/openai-generator-client.ts
src/services/interview/bootstrap/orchestrator.ts
src/services/interview/bootstrap/project-generator.ts
src/services/interview/bootstrap/project-validator.ts
src/services/interview/bootstrap/role-config.ts
src/services/interview/cohort/cohort-summary.ts
src/services/interview/shared/events.ts
src/services/interview/shared/rubric.ts
teamhero-interview-kit/.claude/CLAUDE.md
teamhero-interview-kit/.claude/settings.json
teamhero-interview-kit/INTERVIEW_RULES.md
teamhero-interview-kit/PRIVACY_RELEASE.md
teamhero-interview-kit/RUBRIC_OVERVIEW.md
teamhero-interview-kit/end.sh
teamhero-interview-kit/lib/privacy-gate.sh
teamhero-interview-kit/start.sh
tests/unit/cli/interview.spec.ts
tests/unit/services/interview/assess/ai-observer.spec.ts
tests/unit/services/interview/assess/audit-writer.spec.ts
tests/unit/services/interview/assess/collectors.spec.ts
tests/unit/services/interview/assess/extractors.spec.ts
tests/unit/services/interview/assess/grade-orchestrator.spec.ts
tests/unit/services/interview/bootstrap/orchestrator.spec.ts
tests/unit/services/interview/bootstrap/project-generator.spec.ts
tests/unit/services/interview/bootstrap/project-validator.spec.ts
tests/unit/services/interview/bootstrap/role-config.spec.ts
tests/unit/services/interview/cohort/cohort-summary.spec.ts
tests/unit/services/interview/cohort/skill.spec.ts
tests/unit/services/interview/kit/kit-smoke.spec.ts
tests/unit/services/interview/kit/privacy-gate.spec.ts
tests/unit/services/interview/shared/events.spec.ts
tests/unit/services/interview/shared/rubric.spec.ts
tui/interview.go
tui/interview_bootstrap.go
tui/interview_bootstrap_form.go
tui/interview_bootstrap_test.go
tui/interview_bootstrap_wizard.go
tui/interview_bootstrap_wizard_test.go
tui/interview_cohort.go
tui/interview_cohort_test.go
tui/interview_grade.go
tui/interview_grade_test.go
tui/interview_preview.go
tui/interview_preview_test.go
tui/interview_progress.go
tui/interview_progress_test.go
tui/interview_styles.go
tui/interview_styles_test.go
tui/interview_test.go
tui/main.go
tui/manual_test_script_test.go

Enables auto-review for PRs targeting main, which covers PRs opened from claude/* branches. CodeRabbit's schema has no source-branch filter, so this is the closest we can get to "auto-review Claude PRs." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The 'grade' verb implied scoring against a rubric, which conflicted with the project's observations-not-scores ethics floor. Renamed the CLI verb, file/directory names, type names, docs, and tests to use 'review' instead. Includes: - CLI: teamhero interview grade → teamhero interview review - Go TUI: tui/interview_grade*.go → tui/interview_review*.go, types GradeOptions/GradeRunner → ReviewOptions/ReviewRunner - TS service: src/services/interview/assess/ → review/, orchestrator renamed; GradeResult/GradeInput/GradeOutcome → Review* - Subprocess: scripts/run-interview-grade.ts → run-interview-review.ts - Tests, SKILL.md, README, manual test script, planning doc Also adds a defensive guard to the TS CLI runner: unknown subcommands now print an error and exit 1 instead of silently falling through to commander's top-level help. That silent fallthrough is what made a stale dist look like a missing-command bug when 'teamhero interview' first shipped here. No backward-compat shims — pre-deployment, no external callers. Closes teamhero-scripts-td7, teamhero-scripts-zva. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The "skips all roadmap audit contexts when TEAMHERO_ROADMAP_AUDIT_ENABLED=0" test failed under non-isolated `bun test` because other specs (ai-service, individual-summarizer, report-service) install a `mock.module("src/lib/env.js")` stub whose `getEnv` ignores `process.env`. `mock.module()` registrations leak across files, so when this file ran after one of them, setting `process.env.TEAMHERO_ROADMAP_AUDIT_ENABLED = "0"` had no effect on the gate. Re-pin `getEnv` to a real implementation at the top of this spec so it works regardless of which file ran before it. CI (per-file isolated runner) was already green; this just fixes the fast `bun test` workflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 18

♻️ Duplicate comments (1)

src/services/interview/review/review-orchestrator.ts (1)

45-47: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Use execFileSync/arg array to remove command-injection risk.

Line 45 builds a shell command with interpolated repoUrl/destDir; this allows shell-breaking payloads. Use argument-array execution instead.

Suggested fix

-import { execSync } from "node:child_process";
+import { execFileSync } from "node:child_process";
...
 const defaultCloner: Cloner = (repoUrl, destDir) => {
-	execSync(`git clone --depth=50 '${repoUrl}' '${destDir}'`, {
+	execFileSync("git", ["clone", "--depth=50", repoUrl, destDir], {
 		stdio: "inherit",
 	});
 };

#!/bin/bash
# Verify unsafe shell-interpolated exec usage in the reviewed file.
rg -nP 'execSync\s*\(\s*`git clone' src/services/interview/review/review-orchestrator.ts
rg -nP '\bexecSync\s*\(' src/services/interview/review/review-orchestrator.ts

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/review/review-orchestrator.ts` around lines 45 - 47,
The git clone call in review-orchestrator.ts uses execSync with a
shell-interpolated command (execSync(`git clone --depth=50 '${repoUrl}'
'${destDir}'`, ...)) which is vulnerable to command injection; replace it with a
non-shell, argument-array style call (e.g., execFileSync or spawnSync) passing
["git", "clone", "--depth=50", repoUrl, destDir] and preserve stdio: "inherit"
semantics; ensure you remove the shell=true behavior, pass repoUrl and destDir
as separate args, and adjust any error handling around the exec in the
function/method that contains this execSync invocation.

🧹 Nitpick comments (1)

tests/unit/cli/unknown-subcommand.spec.ts (1)
75-100: ⚡ Quick win

Known-subcommand guard test is slightly environment-dependent.

Line 90 drives run(..., sub, "--help") into TUI passthrough, so this “guard-stage” unit test can depend on local binary/runtime state. Consider isolating guard logic from TUI execution in this test path to keep it fully deterministic.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/cli/unknown-subcommand.spec.ts` around lines 75 - 100, The test is
hitting TUI passthrough when calling run([... sub, "--help"]) making it
environment-dependent; modify the test to isolate the guard by stubbing out the
TUI execution in the deps returned by makeDeps (e.g., set deps.spawnTui to a
no-op or a function that throws a deterministic sentinel) so run() exercises
only the guard logic and never invokes the real TUI, keep the existing
errorSpy/exitSpy handling and assertions around run and unknown-subcommand
messages.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/2026-05-09-candidate-interview-reviewer-plan.md`:
- Line 273: The markdown contains unannotated fenced code blocks (``` ) that
trigger MD040; update each fenced block to include a language identifier (e.g.,
```bash, ```text, ```markdown) so tooling can correctly lint and highlight them.
Locate every triple-backtick fence in the document (all occurrences of ``` ) and
replace the opening fence with an appropriate language token consistent across
similar examples (use bash for shell commands, text for plain output, markdown
for nested markdown examples, etc.), and ensure the same convention is applied
to the other occurrences noted in the review.
- Line 757: Update the stale module path in the implementation progress table:
replace the incorrect `src/services/interview/{bootstrap,assess,cohort,shared}/`
reference with the correct module path that uses `review` (e.g.,
`src/services/interview/{bootstrap,assess,cohort,review}/`) so it matches the
rest of the document and the shipped flow; ensure the `teamhero interview` CLI
subcommand line and any adjacent references use the `review` segment
consistently.

In `@scripts/manual-test-interview.sh`:
- Around line 83-86: The review invocation using "$BIN" interview review
currently omits the analysis mode flag so it may call OpenAI; modify the command
calling "$BIN" interview review (the block with --candidate and --repo) to
include --mode-analysis human-only so the script enforces human-only analysis
and avoids any API usage/cost paths.

In `@scripts/run-interview-review.ts`:
- Around line 53-56: The CLI currently only checks that at least one of
flags.repo or flags.localRepoPath is provided, but allows both; change the check
to require exactly one (XOR). Replace the existing if-condition with a test that
fails when both are present or both absent (e.g. if ((!!flags.repo) ===
(!!flags.localRepoPath)) { ... }), and update the error message to state
"Provide exactly one of --repo <url> or --local-repo-path <dir>" before calling
process.exit(1); keep references to flags.repo and flags.localRepoPath so
reviewers can find the change.

In `@skills/teamhero-interview/SKILL.md`:
- Around line 78-86: The heading "teamhero interview review <repo-url>" is
inconsistent with the actual CLI invocation which uses flags; update the
SKILL.md heading to match the documented usage by changing it to something like
"teamhero interview review --repo <url>" (or show both forms: positional and
flag) and ensure the examples and paragraph describe the same form; edit the
heading text around the `teamhero interview review` section and any nearby
examples so they reference the `--repo <url>` flag (and other flags)
consistently with the code examples and the documented invocation.
- Around line 50-52: Add the missing fenced-code language identifiers by
changing the triple-backtick fences for the command snippets to specify "bash"
(e.g., the blocks containing "teamhero interview bootstrap", "teamhero interview
bootstrap --headless \", "teamhero interview review --candidate \"Jane Doe\"
--repo <url> \", "teamhero interview cohort --role <slug> [--order
alphabetical|chronological]" and the other similar command blocks at lines
referenced) so each code fence begins with ```bash to satisfy markdownlint
MD040; update every occurrence noted (around lines 50-52, 63-72, 81-86, 99-101)
consistently.

In `@src/cli/index.ts`:
- Around line 229-255: The nested-subcommand rejection logic was left using a
hardcoded allowlist (e.g., only "doctor"/"setup") so "teamhero report interview"
bypasses the guard; update the nested-subcommand check to reuse the shared
KNOWN_SUBCOMMANDS constant instead of the ad-hoc array: locate the guard in the
run function that inspects the first/second args and replace the hardcoded
allowed-subcommands with KNOWN_SUBCOMMANDS (so validations for
first/second-level subcommands remain consistent).

In `@src/services/interview/review/ai-observer.ts`:
- Around line 192-201: The current rejectIfScored() string-scans JSON which
false-positives when forbidden names appear inside string values; instead
parse/inspect the response structurally and recursively check object keys
against FORBIDDEN_FIELDS. Update rejectIfScored to: safely parse or coerce the
incoming response to an object (handle non-objects by returning), walk nested
objects and arrays, and if any object has a property name equal to one of
FORBIDDEN_FIELDS (use direct key comparison, not regex), throw the existing
Error message; also catch and rethrow/handle JSON parse errors consistently so
you don't silently ignore malformed input.

In `@src/services/interview/review/audit-writer.ts`:
- Around line 53-76: The yaml function is vulnerable because it interpolates
user-adjacent strings (AuditFrontmatter fields like candidate, role, tags,
recommendation, session_* and session URLs) directly into YAML, allowing invalid
frontmatter or injected keys; fix by switching from manual string interpolation
to safe YAML serialization (e.g., use a YAML library like js-yaml oryaml and
call safeDump/safeLoad for the AuditFrontmatter object) or, if you must
hand-roll it, ensure every scalar is properly quoted and escaped (escape
backslashes, quotes and newlines, emit tags as a YAML sequence not an unquoted
bracket list, and only emit optional keys when present). Update the yaml
function to construct a plain JS object from AuditFrontmatter and pass it to the
serializer (or replace each push with properly quoted/escaped values and tags
emitted as a sequence) so all fields are safely encoded.

In `@src/services/interview/review/collectors/asciinema.ts`:
- Around line 67-86: The code in the event-processing block for evt.data assumes
a single-character payload and misses cases where evt.data contains multiple
characters (e.g., "npm test\r"); update the logic in the handler around
variables buffer, lastKeyDelta, evt.delta, baseEpoch and commands so you iterate
over each character in evt.data (for example for (const ch of evt.data)) and
apply the same branches: treat '\r' and '\n' as Enter (trim buffer, push a
command with timestamp = isoFromUnix(baseEpoch + evt.delta) and
pauseSecondsBeforeEnter = Math.max(0, evt.delta - lastKeyDelta)), treat '�' and
'\b' as backspace (slice buffer and set lastKeyDelta), treat printable chars
(charCode >= 32 or '\t') as append to buffer and set lastKeyDelta; ensure buffer
is reset after pushing a command so subsequent characters are processed
correctly.

In `@src/services/interview/review/collectors/git-history.ts`:
- Around line 14-18: The realRunner implementation currently builds a shell
command string and calls execSync (risking injection/quoting bugs); change it to
call execFileSync with the program "git" and pass the args array directly (i.e.
execFileSync("git", args, { cwd, encoding: "utf8" })) so you remove the manual
quoting/map and avoid shell parsing; update any type/return handling in the
GitRunner wrapper accordingly.

In `@src/services/interview/review/collectors/transcript.ts`:
- Around line 27-37: Validate the sessionStartIso at the start of toIsoFromHMS:
parse it (e.g., new Date(sessionStartIso)) and check
isFinite(date.getTime())/isNaN before using getTime(); if invalid, throw a clear
Error mentioning "invalid sessionStartIso" and include the raw value. Update the
same validation pattern where sessionStartIso is used later in this file (the
second occurrence around lines 51-52) so both code paths fail fast with the same
clear error message; reference function toIsoFromHMS and the other usage site to
locate and mirror the change.

In `@src/services/interview/review/extractors/test-pass.ts`:
- Around line 20-34: The spawnSync invocations in test-pass.ts that run the Go
and Bun tests (the calls to spawnSync("go", ["test", "./..."], { cwd: repoDir,
encoding: "utf8" }) and spawnSync("bun", ["test"], { cwd: repoDir, encoding:
"utf8" })) lack timeout and maxBuffer safeguards; update both calls to include
sensible timeout and maxBuffer options (e.g., timeout: 120000, maxBuffer: 10 *
1024 * 1024) so the subprocesses cannot hang or overflow stdout/stderr, keep the
rest of the options (cwd, encoding) and ensure the rest of the code that reads
r.stdout/r.stderr and computes passed/failed via countMatches remains unchanged.

In `@src/services/interview/review/extractors/verification.ts`:
- Around line 58-71: The interleaving count currently treats any adjacent
prompt→command pair in the merged array as "within 30 seconds" but doesn't
enforce the 30s boundary; update the loop that computes interleavedAfterPrompt
(which iterates merged, uses cur/prev and checks cur.type === "command" &&
prev.type === "prompt") to also compute the timestamp difference (e.g., parse
cur.timestamp and prev.timestamp to milliseconds via Date) and only increment
interleavedAfterPrompt when the difference is <= 30000 ms; keep the existing
sort by timestamp and ensure you handle timestamp parsing robustly (use
getTime() or valueOf()) so only prompt→test pairs within the 30-second window
are counted.

In `@src/services/interview/review/review-orchestrator.ts`:
- Around line 83-85: Replace the ad-hoc date logic: remove todayIso() and any
manual string concatenation like `${sessionDate}T09:00:00Z`; instead import and
use the date-utils helpers (resolveStartISO, resolveEndISO, resolveEndEpochMs,
formatDateUTC) so timezone-safe boundaries and display formatting are
consistent; update the places referenced (the todayIso function and the
constructions around sessionDate at the other locations) to call
resolveStartISO/resolveEndISO for interval boundaries and formatDateUTC for any
display strings or ISO formatting.
- Around line 39-42: The Cloner and ReviewDependencies types are port interfaces
and must be moved into the shared core types file; remove their local
declarations (e.g., the Cloner type and the ReviewDependencies type block) and
declare them in src/core/types.ts instead, then import and use those types in
review-orchestrator.ts (and any other files that used the local definitions) so
all boundary port interfaces live in the central types module and usages
reference the imported Cloner and ReviewDependencies symbols.

In `@tests/unit/services/interview/cohort/cohort-summary.spec.ts`:
- Around line 96-109: The test currently asserts status words only; instead,
assert the actual glyph characters so the UI signal can't be removed silently:
update the test using the existing helpers (makeCandidate, loadCohort,
renderCohortSummary) to check for the exact glyphs "✅" for reviewed Alice and
"⏳" for pending Bob (e.g. expect(body).toMatch(/Alice.*✅/) and
expect(body).toMatch(/Bob.*⏳/)), while retaining the existing checks for
"Reviewed"/"Pending" and "Hire" to keep coverage of both glyph and text.

In `@tests/unit/services/interview/review/ai-observer.spec.ts`:
- Around line 123-132: The test currently only checks
OBSERVATION_RESPONSE_SCHEMA.additionalProperties at the top level; add a
recursive schema traversal (e.g., a helper named assertNoAdditionalProperties or
traverseSchema) that walks OBSERVATION_RESPONSE_SCHEMA and asserts
additionalProperties === false for every object schema node it encounters
(inspect properties, patternProperties, additionalProperties, items,
allOf/anyOf/oneOf, not, definitions/$defs, etc.), and also ensure the banned
field names ("score","weighted_total","raw_total","band","signal_count") are not
present anywhere by checking keys at each visited node; call this helper from
the "declares strict json_schema..." test to harden the assertion.

---

Duplicate comments:
In `@src/services/interview/review/review-orchestrator.ts`:
- Around line 45-47: The git clone call in review-orchestrator.ts uses execSync
with a shell-interpolated command (execSync(`git clone --depth=50 '${repoUrl}'
'${destDir}'`, ...)) which is vulnerable to command injection; replace it with a
non-shell, argument-array style call (e.g., execFileSync or spawnSync) passing
["git", "clone", "--depth=50", repoUrl, destDir] and preserve stdio: "inherit"
semantics; ensure you remove the shell=true behavior, pass repoUrl and destDir
as separate args, and adjust any error handling around the exec in the
function/method that contains this execSync invocation.

---

Nitpick comments:
In `@tests/unit/cli/unknown-subcommand.spec.ts`:
- Around line 75-100: The test is hitting TUI passthrough when calling run([...
sub, "--help"]) making it environment-dependent; modify the test to isolate the
guard by stubbing out the TUI execution in the deps returned by makeDeps (e.g.,
set deps.spawnTui to a no-op or a function that throws a deterministic sentinel)
so run() exercises only the guard logic and never invokes the real TUI, keep the
existing errorSpy/exitSpy handling and assertions around run and
unknown-subcommand messages.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: f1afff1b-ff4d-4824-b702-affafb49c191

📥 Commits

Reviewing files that changed from the base of the PR and between 106d3ac and ffe507a.

📒 Files selected for processing (37)

.coderabbit.yaml
README.md
docs/2026-05-09-candidate-interview-reviewer-plan.md
scripts/manual-test-interview.sh
scripts/run-interview-review.ts
skills/teamhero-interview/SKILL.md
src/cli/index.ts
src/services/interview/bootstrap/openai-generator-client.ts
src/services/interview/review/ai-observer.ts
src/services/interview/review/audit-writer.ts
src/services/interview/review/collectors/asciinema.ts
src/services/interview/review/collectors/git-history.ts
src/services/interview/review/collectors/jsonl-log.ts
src/services/interview/review/collectors/transcript.ts
src/services/interview/review/extractors/risk-awareness.ts
src/services/interview/review/extractors/test-pass.ts
src/services/interview/review/extractors/throughput.ts
src/services/interview/review/extractors/verification.ts
src/services/interview/review/review-orchestrator.ts
src/services/interview/review/types.ts
tests/unit/cli/unknown-subcommand.spec.ts
tests/unit/services/discrepancy.spec.ts
tests/unit/services/interview/cohort/cohort-summary.spec.ts
tests/unit/services/interview/cohort/skill.spec.ts
tests/unit/services/interview/review/ai-observer.spec.ts
tests/unit/services/interview/review/audit-writer.spec.ts
tests/unit/services/interview/review/collectors.spec.ts
tests/unit/services/interview/review/extractors.spec.ts
tests/unit/services/interview/review/review-orchestrator.spec.ts
tui/interview.go
tui/interview_progress.go
tui/interview_review.go
tui/interview_review_test.go
tui/interview_styles.go
tui/interview_test.go
tui/main.go
tui/manual_test_script_test.go

✅ Files skipped from review due to trivial changes (3)

.coderabbit.yaml
README.md
tests/unit/services/interview/review/extractors.spec.ts

🚧 Files skipped from review as they are similar to previous changes (4)

tui/main.go
tests/unit/services/interview/cohort/skill.spec.ts
tui/interview_styles.go
tui/interview_progress.go

Security and data-loss: - review-orchestrator: replace shell-interpolated `git clone` with execFileSync so a candidate repo URL containing single quotes or shell metacharacters cannot inject commands. - git-history collector: same — switch execSync(`git ${args}`) to execFileSync("git", args). - project-generator.writeGenerated: refuse LLM-supplied file paths that are absolute or that resolve outside outputDir (path traversal via `../` or absolute paths in the generator response). - project-generator.clearOutputDir: refuse to recursively remove the filesystem root, the user's $HOME, or any root-level path. A misconfigured outputDir of "/" or "$HOME" otherwise wipes user data. - review-orchestrator: candidate_id now includes a UTC HHMMSS suffix so two reviews of the same candidate on the same day land in distinct directories instead of silently overwriting each other. Trust-boundary validation: - role-config.readRoleConfig: parse failures and shape mismatches now throw a descriptive error instead of letting malformed JSON propagate as untyped data into the orchestrator. - cohort-summary.loadCohort: validate that audit.json has the required frontmatter shape; skip malformed records instead of crashing renderRow with `undefined`. - audit-writer.yaml: quote scalar values that contain colons, hashes, newlines, etc. A candidate name with `:` or `#` previously corrupted the YAML frontmatter, breaking the audit-writer ↔ cohort-summary round-trip. - shared/events.parseInterviewEvent: validate `step` / `status` / `message` / `progress` fields on `progress` events instead of trusting any object whose `type` field is "progress". - run-interview-bootstrap: validate `--time-box` is a finite number and `--mode-*` flags are from the allowed enum before forwarding into RoleConfig; previously NaN/garbage leaked through and failed with confusing downstream messages. - TUI bootstrap: when `--mode-rubric=default+jd`, also check that the `--jd-path` file actually exists, not just that it's non-empty. Correctness / hang / data-loss in collectors: - jsonl-log: gracefully return empty when interview.log is missing instead of throwing ENOENT and failing the whole review run. - transcript: broaden the bare-speaker regex to accept "Speaker 1" style labels; fall back to the epoch if sessionStartIso parses to NaN instead of crashing toISOString(). - verification: actually apply the documented 30s prompt-to-test window using timestamp deltas, not stream adjacency. - test-pass: cap candidate test runs with a 5-minute SIGTERM timeout and surface r.error (ETIMEDOUT, ENOENT, …) in the output so a hung candidate process can't block the grader indefinitely. - project-validator.walk: use lstatSync and skip symlinks so an attacker-controlled generator output with a cyclic symlink can't hang the validator. Error contracts and TUI: - bootstrap/orchestrator: wrap generation + writeRoleConfig in try/catch so unexpected throws surface as a structured RunBootstrapResult failure instead of an unhandled rejection at the CLI boundary. - run-interview-bootstrap / run-interview-review: drop dotenv `{override: true}` so CI-provided env vars (real OPENAI_API_KEY) win over anything in a local .env. - TUI wizard dispatcher: TTY-check stdin before launching the interactive huh.Form (no flags + piped stdin previously hung forever); add a nil-Options guard before the runner call. - TUI progress model: an "update" event now promotes the phase from pending → running so the UI reflects activity even when an emitter skips the matching "start". - TUI bootstrap form `nonEmpty`: trim whitespace so a value of spaces-only is rejected rather than treated as set. Hygiene: - privacy-gate.sh: move `set -u` inside the direct-exec guard so sourcing the library doesn't leak nounset into the caller's shell. - start.sh: write .interview-state/started-at AFTER the asciinema preflight succeeds, so a failed start doesn't leave a stale marker that end.sh would mistake for a completed session. Test fixes: - cohort-summary: import AuditFrontmatter from the renamed `review/` module (was still pointing at the deleted `assess/` path; tsc caught it). - project-validator spec fixture: stop adding +5 to the expected LOC for failing-test files — test files are excluded from the LOC count by the validator's TEST_NAME_PATTERN. - audit-writer + review-orchestrator specs: expect quoted YAML URLs now that the YAML emitter quotes scalars containing colons. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-11T03:05:09Z

Note

Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Generating unit tests... This may take up to 20 minutes.

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (1)

src/services/interview/bootstrap/project-generator.ts (1)

95-99: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Top-level delete guard is ineffective for paths like /usr and /etc.

Line 98 checks parent === abs, which only catches filesystem roots. It does not block top-level directories, so clearOutputDir() can still recursively delete unsafe system paths if misconfigured.

🔧 Suggested fix

-import { dirname, isAbsolute, join, relative, resolve, sep } from "node:path";
+import { dirname, isAbsolute, join, parse, relative, resolve, sep } from "node:path";
...
 function assertSafeToClear(outputDir: string): void {
 	const abs = resolve(outputDir);
-	const root = resolve(abs, "/");
+	const root = parse(abs).root;
 	if (abs === root || abs === sep) {
 		throw new Error(`Refusing to clear filesystem root: ${abs}`);
 	}
 	const home = homedir();
 	if (home && abs === resolve(home)) {
 		throw new Error(`Refusing to clear home directory: ${abs}`);
 	}
-	// Refuse a top-level path like /usr, /etc, /home — anything where the
-	// path has no parent beyond the root.
-	const parent = dirname(abs);
-	if (parent === abs) {
-		throw new Error(`Refusing to clear root-level path: ${abs}`);
+	// Refuse top-level paths like /usr, /etc, /home.
+	if (dirname(abs) === root) {
+		throw new Error(`Refusing to clear top-level path: ${abs}`);
 	}
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/bootstrap/project-generator.ts` around lines 95 - 99,
The current check in clearOutputDir uses dirname(abs) and only throws when
parent === abs, which misses top-level children like "/usr" (whose parent is "/"
not equal to abs); update the guard to also detect paths directly under the
filesystem root by comparing parent to the path root: require
path.parse(abs).root and throw if parent === abs || parent ===
path.parse(abs).root; reference the clearOutputDir function and the local
variables abs and parent and use path.parse(abs).root to identify root-level
parents.

🧹 Nitpick comments (1)

src/services/interview/review/extractors/test-pass.ts (1)
14-17: ⚡ Quick win

Move TestRunner port type to src/core/types.ts.

This defines a new port-like interface locally; per repo architecture rules, keep port interfaces centralized in core types and import it here.

As per coding guidelines, "ALL port interfaces must be defined in src/core/types.ts — never create new types files".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/review/extractors/test-pass.ts` around lines 14 - 17,
The type alias TestRunner is a port interface that must be centralized instead
of being defined locally; remove the local export of TestRunner and instead add
the identical exported type declaration to the core types module (types.ts),
then import { TestRunner } into this file and use it in place of the removed
local type; ensure the signature stays { readonly passed: number; readonly
failed: number; readonly output: string } and that the core types module exports
it for other modules to consume.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/services/interview/bootstrap/project-generator.ts`:
- Around line 27-33: Move the GeneratorClient interface declaration into the
shared core types module (the central core types file) so it lives with other
port interfaces; export it from there and remove the local declaration in
project-generator.ts. Update all files that reference GeneratorClient (and
ensure GeneratedProject and RoleConfig are imported or re-exported from the core
types file) to import the interface from the shared core types module instead of
the local service file. Ensure the generate method signature and readonly
modifiers remain identical when relocating.
- Around line 109-120: The copyDir function currently uses statSync(s) which
follows symlinks; change it to use lstatSync(s) and explicitly skip entries
where lstatSync(s).isSymbolicLink() to avoid copying files outside
kitTemplateDir or entering symlink loops. In function copyDir, replace statSync
with lstatSync, add a guard like "if (st.isSymbolicLink()) continue", and keep
the existing directory handling (mkdirSync + recursive copy) and file writing
(mkdirSync(dirname(d)) + writeFileSync) for non-symlink entries.

In `@src/services/interview/review/audit-writer.ts`:
- Around line 57-68: The yamlScalar function detects values needing quoting but
only escapes backslashes and double quotes, so control characters like newline,
carriage return and tab remain raw and produce invalid YAML; update yamlScalar
to additionally replace control characters (e.g. \n, \r, \t, \b, \f, and any
other non-printable controls you expect) with their corresponding escape
sequences before wrapping in quotes (while still escaping backslashes and double
quotes), ensuring the returned double-quoted scalar contains valid YAML escape
sequences; modify the replacement logic used when building the escaped variable
in yamlScalar to run a single pass that maps those control characters to "\\n",
"\\r", "\\t", "\\b", "\\f", etc., then return the quoted string.
- Around line 70-76: yamlTag currently detects whitespace (including
newlines/tabs) but only escapes backslashes and double quotes; update yamlTag to
additionally replace control whitespace characters with proper escape sequences
before wrapping in quotes: after computing escaped = value.replace(/\\/g,
"\\\\").replace(/"/g, '\\"'), also replace newline (\n) with "\\n", carriage
return (\r) with "\\r", and tab (\t) with "\\t" (and any other control chars you
deem necessary) so the returned quoted string is fully escaped; keep the
existing regex and return logic but perform these extra replacements on the
escaped variable inside the yamlTag function.

In `@src/services/interview/review/collectors/git-history.ts`:
- Around line 27-31: The current try/catch around runner(["log",
`--format=${FORMAT}`, "--numstat"], repoDir) swallows all errors and returns [];
change it to only convert known benign git cases (e.g., "does not have any
commits", "unknown revision", or specific exit codes from git that indicate
empty history) into [], and rethrow any other exceptions so real failures
(permissions, broken git, unexpected runtime errors) surface; specifically wrap
the call to runner in a try, inspect the thrown error's message/exit code/stderr
and return [] only for the recognized benign messages, otherwise throw the
original error so higher-level handlers (in-report placeholder / appendix / CLI
stderr) can handle it.

---

Duplicate comments:
In `@src/services/interview/bootstrap/project-generator.ts`:
- Around line 95-99: The current check in clearOutputDir uses dirname(abs) and
only throws when parent === abs, which misses top-level children like "/usr"
(whose parent is "/" not equal to abs); update the guard to also detect paths
directly under the filesystem root by comparing parent to the path root: require
path.parse(abs).root and throw if parent === abs || parent ===
path.parse(abs).root; reference the clearOutputDir function and the local
variables abs and parent and use path.parse(abs).root to identify root-level
parents.

---

Nitpick comments:
In `@src/services/interview/review/extractors/test-pass.ts`:
- Around line 14-17: The type alias TestRunner is a port interface that must be
centralized instead of being defined locally; remove the local export of
TestRunner and instead add the identical exported type declaration to the core
types module (types.ts), then import { TestRunner } into this file and use it in
place of the removed local type; ensure the signature stays { readonly passed:
number; readonly failed: number; readonly output: string } and that the core
types module exports it for other modules to consume.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 20c0ad49-c16c-47ab-bd28-dd3a3617b024

📥 Commits

Reviewing files that changed from the base of the PR and between ffe507a and 790b9e8.

📒 Files selected for processing (25)

scripts/run-interview-bootstrap.ts
scripts/run-interview-review.ts
src/services/interview/bootstrap/orchestrator.ts
src/services/interview/bootstrap/project-generator.ts
src/services/interview/bootstrap/project-validator.ts
src/services/interview/bootstrap/role-config.ts
src/services/interview/cohort/cohort-summary.ts
src/services/interview/review/audit-writer.ts
src/services/interview/review/collectors/git-history.ts
src/services/interview/review/collectors/jsonl-log.ts
src/services/interview/review/collectors/transcript.ts
src/services/interview/review/extractors/test-pass.ts
src/services/interview/review/extractors/verification.ts
src/services/interview/review/review-orchestrator.ts
src/services/interview/shared/events.ts
teamhero-interview-kit/lib/privacy-gate.sh
teamhero-interview-kit/start.sh
tests/unit/services/interview/bootstrap/project-validator.spec.ts
tests/unit/services/interview/review/audit-writer.spec.ts
tests/unit/services/interview/review/review-orchestrator.spec.ts
tui/interview_bootstrap.go
tui/interview_bootstrap_form.go
tui/interview_bootstrap_wizard.go
tui/interview_bootstrap_wizard_test.go
tui/interview_progress.go

🚧 Files skipped from review as they are similar to previous changes (17)

src/services/interview/shared/events.ts
src/services/interview/review/collectors/jsonl-log.ts
src/services/interview/review/collectors/transcript.ts
tests/unit/services/interview/bootstrap/project-validator.spec.ts
scripts/run-interview-review.ts
teamhero-interview-kit/lib/privacy-gate.sh
tests/unit/services/interview/review/review-orchestrator.spec.ts
tests/unit/services/interview/review/audit-writer.spec.ts
tui/interview_progress.go
tui/interview_bootstrap_wizard.go
src/services/interview/bootstrap/role-config.ts
tui/interview_bootstrap.go
src/services/interview/cohort/cohort-summary.ts
tui/interview_bootstrap_form.go
tui/interview_bootstrap_wizard_test.go
src/services/interview/bootstrap/project-validator.ts
src/services/interview/review/review-orchestrator.ts

Without `[args...]` on the command spec, commander rejected any positional argument as "too many arguments" — so `teamhero interview bootstrap`, `teamhero report --since ...`, etc. all failed before reaching the Go TUI. The `.allowUnknownOption()` flag only covers `--flags`, not positionals. Declaring each command as `command [args...]` makes commander accept variadic positionals which we then forward verbatim to the TUI binary. No call-site changes needed; the action callbacks already re-derive args from process.argv. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-11T03:14:42Z

✅ Unit tests committed locally. Commit: cf8229bec27921480f52ef5cbb9385964b7e1eb0

- audit-writer: escape literal newlines, carriage returns, and tabs when emitting double-quoted YAML scalars. A candidate name with an embedded newline previously made the frontmatter invalid YAML even with the surrounding quotes. - cli/index.ts: add "interview" to the report-action's nested- subcommand guard so `teamhero report interview` is rejected consistently with `report doctor` / `report setup`, instead of silently routing into the report flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts (3)

156-173: ⚡ Quick win

Chronological test does not actually prove session_date precedence.

Right now date and session_date are identical for both candidates, so sorting by date would also pass. Make them intentionally conflict so this test catches regressions.

Suggested test hardening

 			makeCandidate(dir, "carol", {
 				candidate: "Carol",
-				date: "2026-05-20",
+				date: "2026-04-01",
 				session_date: "2026-05-20",
 			});
 			makeCandidate(dir, "alice", {
 				candidate: "Alice",
-				date: "2026-05-01",
+				date: "2026-06-30",
 				session_date: "2026-05-01",
 			});

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts`
around lines 156 - 173, The test "orders chronologically using session_date when
available" currently uses identical date and session_date values so it doesn't
prove session_date precedence; update the test data created via makeCandidate
for "carol" and "alice" so their date and session_date intentionally conflict
(e.g., set Carol.date before Alice.date but Carol.session_date after
Alice.session_date) so renderCohortSummary(..., {order: "chronological"}) must
rely on session_date to produce Alice before Carol; keep using loadCohort and
the same expectation on body.indexOf.

132-135: ⚡ Quick win

Body-wide negative date assertion is brittle.

expect(body).not.toContain("2026-05-10") can fail if that value appears elsewhere (metadata/footer) even when the Interviewed column is correct. Scope the check to Alice’s row.

Suggested assertion narrowing

 			const body = renderCohortSummary("senior-backend", loadCohort(dir));
-			expect(body).toContain("2026-06-01");
-			// The submission date should NOT appear in place of the session date
-			expect(body).not.toContain("2026-05-10");
+			const aliceRow = body.split("\n").find((line) => line.includes("Alice")) ?? "";
+			expect(aliceRow).toContain("2026-06-01");
+			expect(aliceRow).not.toContain("2026-05-10");

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts`
around lines 132 - 135, The negative date assertion is too broad: instead of
using expect(body).not.toContain("2026-05-10"), narrow the check to Alice's
table row by locating the row that contains "Alice" (e.g., parse the HTML in
body and select the tr or nearest container with text "Alice") and assert that
that specific row does not contain "2026-05-10"; update the test to query the
Alice row from body and replace the global not.toContain check with an assertion
against that row's text content.

207-210: ⚡ Quick win

The decimal-number assertion can create false failures.

expect(body).not.toMatch(/\d+\.\d+/) is broader than the requirement and could fail on unrelated decimals. Header assertions already validate the column contract well.

Suggested cleanup

 			expect(body).not.toMatch(/\|\s*Score\s*\|/i);
 			expect(body).not.toMatch(/\|\s*Total\s*\|/i);
 			expect(body).not.toMatch(/\|\s*Rank\s*\|/i);
-			expect(body).not.toMatch(/\d+\.\d+/); // no decimal numbers (scores)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts`
around lines 207 - 210, The test contains an overly broad negative assertion
expect(body).not.toMatch(/\d+\.\d+/) which can fail on unrelated decimals;
remove that assertion from cohort-summary-edge-cases.spec.ts (the
expect(body).not.toMatch(/\d+\.\d+/) line) or, if you want to keep a
decimal-specific check, replace it with a narrow pattern that matches decimal
values inside table cells (e.g., a cell-boundary regex like
/\|\s*\d+\.\d+\s*\|/) so only score-like decimals are asserted against; keep the
existing header assertions as they already validate the column contract.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/services/interview/bootstrap/project-generator-security.spec.ts`:
- Around line 109-125: The test expects generateProject(role({ outputDir: dir
}), clientFor(malicious)) to reject on paths containing a NUL byte, so add an
explicit NUL-byte validation in the path-resolution logic inside generateProject
(the code path that processes GeneratedProject.files -> file.path) to detect
"\0" (or char code 0) and throw a specific, named error (e.g., InvalidPathError
or a new PathValidationError) before any FS operations; then update the test to
assert that generateProject rejects with that specific error type/message
instead of a generic toThrow(), referencing generateProject, GeneratedProject,
and the file.path handling code for location.

In `@tests/unit/services/interview/bootstrap/role-config-edge-cases.spec.ts`:
- Around line 199-213: The test "stores optional jdPath and customPrompt when
present" only sets and asserts jdPath but not customPrompt; update the test to
actually include a customPrompt in the RoleConfig passed to writeRoleConfig and
add an assertion that readRoleConfig(dir)?.customPrompt equals that value (or
alternatively rename the test to only mention jdPath). Locate the test block in
role-config-edge-cases.spec.ts and modify the cfg object creation (RoleConfig)
to include customPrompt and add an expect(...) assertion after reading via
readRoleConfig(dir) to verify persistence, or change the test description to
match the current assertions.

---

Nitpick comments:
In `@tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts`:
- Around line 156-173: The test "orders chronologically using session_date when
available" currently uses identical date and session_date values so it doesn't
prove session_date precedence; update the test data created via makeCandidate
for "carol" and "alice" so their date and session_date intentionally conflict
(e.g., set Carol.date before Alice.date but Carol.session_date after
Alice.session_date) so renderCohortSummary(..., {order: "chronological"}) must
rely on session_date to produce Alice before Carol; keep using loadCohort and
the same expectation on body.indexOf.
- Around line 132-135: The negative date assertion is too broad: instead of
using expect(body).not.toContain("2026-05-10"), narrow the check to Alice's
table row by locating the row that contains "Alice" (e.g., parse the HTML in
body and select the tr or nearest container with text "Alice") and assert that
that specific row does not contain "2026-05-10"; update the test to query the
Alice row from body and replace the global not.toContain check with an assertion
against that row's text content.
- Around line 207-210: The test contains an overly broad negative assertion
expect(body).not.toMatch(/\d+\.\d+/) which can fail on unrelated decimals;
remove that assertion from cohort-summary-edge-cases.spec.ts (the
expect(body).not.toMatch(/\d+\.\d+/) line) or, if you want to keep a
decimal-specific check, replace it with a narrow pattern that matches decimal
values inside table cells (e.g., a cell-boundary regex like
/\|\s*\d+\.\d+\s*\|/) so only score-like decimals are asserted against; keep the
existing header assertions as they already validate the column contract.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 04c43fb4-79c1-440c-a31f-06f8b7db413e

📥 Commits

Reviewing files that changed from the base of the PR and between 790b9e8 and cf8229b.

📒 Files selected for processing (7)

src/cli/index.ts
tests/unit/services/interview/bootstrap/openai-generator-client.spec.ts
tests/unit/services/interview/bootstrap/project-generator-security.spec.ts
tests/unit/services/interview/bootstrap/role-config-edge-cases.spec.ts
tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts
tests/unit/services/interview/review/asciinema-edge-cases.spec.ts
tests/unit/services/interview/review/audit-writer-edge-cases.spec.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/cli/index.ts

CodeRabbit-authored test suite verified that LLM-supplied paths containing "\0" can otherwise truncate path checks in some syscalls (the classic "evil.png\0.sh" trick). Reject the byte outright in resolveWithinRoot before any further path math. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The interview bootstrap wizard previously ran as a sequential loop of huh.NewForm(...).Run() calls. That produced a bare form on a blank screen with no orientation — no //// TEAM HERO header, no right-side configuration summary, no navigation hints. It didn't look like part of the same product as the report wizard. This commit rebuilds it as a single bubbletea program (interviewBootstrapTeaModel) so it picks up the same shell-header + summary-panel + hints layout as wizard.go. The data container (bootstrapWizardModel) and per-screen validators are unchanged; only the runner is swapped: runHuhBootstrapWizard → runBootstrapTeaWizard → tea.NewProgram(...).Run() Layout: - renderShellHeader at the top — same `//// TEAM HERO` gradient slashes - Left ~60% column: the active step's huh form - Right ~40% column: a bordered "Interview Bootstrap" summary panel listing every field. Filled values render bright, the active step's label is highlighted cyan, and not-yet-reached fields show "—" - Footer hint line: `enter continue • ctrl+c quit` Testing: - Adds teatest + x/exp/golden as direct Go test deps (the latter was already transitively present) - New layout tests in tui/interview_bootstrap_tea_test.go drive the model through teatest and assert the three layout invariants (header / summary / hints) are present at multiple wizard states - TestInterviewBootstrap_Screenshot_WritesGolden produces three human-reviewable plaintext screenshots under tui/testdata/interview_bootstrap/ — the test compares against them on every run, so layout regressions show up as diffs in PRs. Run with TEAMHERO_UPDATE_SCREENSHOTS=1 to refresh after intentional layout changes - Three pre-existing launcher-smoke tests (TestBootstrapWizard_HuhLauncher*Smoke) were rewired to stub the new program-runner seam (runBootstrapTeaProgram) instead of the old huhFormRun, keeping them fast and goroutine-leak-free - TestRunInterview_BootstrapVerb_RequiresFlags now pins isStdinTTY to false so the test exercises the "no TTY → reject" branch deterministically (without the pin, an interactive `go test` session inherits a real TTY and the wizard hangs waiting for input) - Broadened the package-shared ansiEscapes regex to also strip DEC-private mode sequences (`\x1b[?25l`, `\x1b[?2004h`) so the golden screenshots stay clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two adversarial review passes identified ~15 real bugs across the TUI refactor and the recent CodeRabbit-driven service-layer fixes. This commit lands the meaningful ones; the rest were design choices, nitpicks, or speculative concerns. ## TUI critical fixes - **Production wizard wiring**: runBootstrapTeaWizard previously called `tea.NewProgram(model, tea.WithInput(nil), tea.WithOutput(nil))` — the wizard would have hung the moment any user pressed a key, because bubbletea had no stdin. Use default stdio + WithAltScreen() like the report wizard does. The smoke tests didn't catch this because they stubbed out runBootstrapTeaProgram entirely. - **Smoke tests were tautologies**: stubTeaProgramRunner just set `confirmed = true` and returned a synthetic result. The state machine could be broken and the tests would still pass. The stub now walks the model through real advance()/nextStep() transitions in-process, filling values that validators require, so a regression in the rubric branch or time-box sub-step actually fails a test. - **Time-box "custom" branch**: previously swapped the form in place without advancing m.step, so highWater stayed at ibStepTimeBox and the summary panel claimed the user was still on the time-box screen while the custom-minutes input rendered. Introduced an explicit ibStepTimeBoxCustom step the state machine routes through. - **Confirm "Cancel" produced silent success**: clicking Cancel set confirmed=false but the dispatcher returned 0 with no message, indistinguishable from a successful no-op. Now prints "Wizard cancelled at confirm" / "Wizard aborted" to stderr so users see why no role was generated. - **WindowSizeMsg never reached the huh form**: on resize we updated width and returned, but didn't forward the message into the form itself. Internal field viewports were stale until the next keystroke. Now forward the resize message after updating width; this also makes huh.Confirm render its button row inline (visible in the refreshed confirm-step screenshot). - **truncate() sliced bytes, not runes**: summary-panel labels containing multi-byte characters (Étienne, CJK, etc.) would render as invalid UTF-8 when truncated mid-codepoint. Switched to []rune slicing and rejected non-positive max explicitly. - **Deleted ~200 lines of dead code**: stepRole through stepConfirm and stepConditionalRubric in interview_bootstrap_form.go were unreachable after the runner swap. stubHuhFormRunner in the test file was also unused. Both removed. ## Service-layer critical fixes - **assertSafeToClear was too permissive**: a misconfigured outputDir of "/tmp", "/var", "/usr", "/home", etc. previously passed and clearOutputDir would recursively delete the entire system directory. Now reject a DANGEROUS_ROOTS set explicitly, plus single-segment absolute paths. Subdirectories of /tmp (mkdtemp paths) are still permitted, so tests work normally. - **Path traversal via symlinks**: resolveWithinRoot did string-level containment math but never realpath'd the parent chain. If a previous generation attempt left a symlink like `subdir -> /etc`, writing `subdir/passwd` would land outside the workspace. Now walk intermediate segments with lstatSync and refuse any existing symlink in the path. - **YAML escapes dropped most C0 controls**: audit-writer's escapeYamlDoubleQuoted handled `\\ " \n \r \t` but let other control characters (\x00–\x1F, \x7F) through verbatim. Per YAML 1.2 §5.7, unescaped controls in a double-quoted scalar are a parse error in strict parsers (js-yaml FAILSAFE, libyaml). A candidate name with a stray \x01 from copy-paste would silently corrupt audit.json round-trip. Now emit `\xHH` for any C0 control character. - **test-pass spawnSync timeout didn't kill child group**: Node's timeout signals only the direct child (`go` or `bun`), leaving sub-binaries reparented to PID 1 and still running on the grader machine. Wrapped the spawn in coreutils `timeout --kill-after=10s 300s` which signals the whole process group; fall back to plain spawnSync with SIGKILL when coreutils isn't available. ## Hygiene - **discrepancy.spec.ts afterAll(mock.restore)**: CLAUDE.md explicitly forbids this — mock.restore() undoes ALL mock.module() registrations, including the pinning I added at the top of the file. Per-file isolation via scripts/run-tests.sh is the project's mode; the pinning is defensive against bun test's non-isolated path. afterAll restore was actively counter-productive. - **git-history hardcodes argv and adds `--`**: the runner already builds argv from constants, but added the literal "--" terminator before any future caller can inject user-controlled refs/pathspecs that begin with "-" and would be parsed as git options. - **cohort-summary signed_off type validation + pipe escaping**: loadCohort now requires signed_off to be a real boolean (not a string "false" that JS treats as truthy). renderRow escapes `|` and newlines in candidate / recommendation fields so a name like "Alice | aka Bob" doesn't break the markdown table layout. - **transcript regex accepts Unicode names**: switched from `[A-Za-z][\w .'-]*?` to `\p{L}[\p{L}\p{N} .'_-]*?` with the /u flag so "Étienne: foo" parses correctly. The previous regex silently dropped non-ASCII speaker labels. ## What I chose NOT to fix - isStdinTTY mutable package var (no t.Parallel today; defer) - WaitFinished timeout deferral risk (small window; defer) - jsonl-log existsSync race (benign — try/catch already wraps the read) - verification.ts adjacent-only prompt window (design choice; documented) - start.sh exec failure leaves started-at (would need a wrapper script) - CLI nested-subcommand denylist won't catch new TUI subcommands (need a Go-TUI-side change to reject unknown positionals) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

User-reported bug: - TUI bootstrap wizard exited with "Wizard cancelled at confirm" even after the user clicked through all 13 screens. huh.Confirm uses the initial value of the bound variable to decide which button has focus, and our `confirmed bool` defaulted to false (zero value), so the wizard opened the confirm screen with focus on "Cancel". A stray Enter cancelled the whole run. Now pre-set `d.confirmed = true` before building the confirm form so the affirmative button has focus by default — matches typical "Are you sure? [Y/n]" UX. CodeRabbit threads (fixed): - scripts/run-interview-review.ts: also reject when both --repo AND --local-repo-path are supplied (was previously underspecified — required at least one, but accepted both ambiguously). - src/services/interview/review/ai-observer.ts: rejectIfScored now walks the response object tree and inspects actual keys instead of regex-scanning the serialized JSON. An evidence excerpt containing literal text like `"score": 7/10` (e.g. a code comment) no longer false-positives. - skills/teamhero-interview/SKILL.md: review verb heading now matches the actual invocation form (`--candidate <name> --repo <url>`) with a note that the repo can also be positional. - tests/unit/services/interview/bootstrap/role-config-edge-cases.spec.ts: renamed the misleading "stores optional jdPath and customPrompt" test to "stores optional jdPath", and added a sibling test for customPrompt — the original test name didn't match its assertion. - src/services/interview/review/collectors/asciinema.ts: per-char iteration of evt.data so multi-character input chunks (paste, rapid typing) reconstruct commands correctly. CSI escape sequences (ESC `[` ... letter) are detected and skipped as a unit so arrow keys aren't accidentally buffered character-by-character. - src/services/interview/review/collectors/git-history.ts: catch block now distinguishes benign "no commits yet" from genuine failures (binary missing, permission denied), writing the latter to stderr instead of fully swallowing the error. CodeRabbit threads (not fixed, with rationale): - ai-observer/cohort-summary test hardening suggestions are deferred — current assertions cover the load-bearing invariants; the suggested pinning would couple tests to presentation details. - "Move port interfaces to core/types.ts" architectural suggestions apply only to ports consumed across module boundaries; the interview module's `Cloner`, `ReviewDependencies`, and `GeneratorClient` are internal to their namespaces. - Markdownlint MD040 nits on planning doc and SKILL.md are cosmetic. All 1749 TS tests pass; full Go suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tion

…+ generation User-reported bug: "Idea generation failed: context deadline exceeded (Client.Timeout exceeded while awaiting headers)" in the bootstrap wizard when the user picked "Suggest ideas". Root cause: the openAIIdeaFetcher reused the package-wide defaultHTTPClient (5s timeout), which is tuned for cheap doctor/setup auth-probe calls. The OpenAI Responses API generating 5 ideas under a strict json_schema routinely takes 15-45s — the client timed out before headers came back. Changes: - tui/interview_bootstrap_ideas.go: give the idea fetcher its own http.Client with a 90s timeout sized for content generation. Tests inject client directly, so test paths are unaffected. - src/services/interview/bootstrap/openai-generator-client.ts: honor AI_MODEL from env (matches the Go side and the rest of the TS codebase). Keeps gpt-5-mini as the default when nothing is set. - scripts/run-interview-bootstrap.ts: add --model and --max-attempts flags so we can override per-invocation without editing code (needed for tuning generation quality on principal-tier roles). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@disabled

…pts-hto) Bootstrap was failing validation with "35 LOC; expected 400-700" on a Principal Full Stack role whose generated project contained 447 lines of C# + Vue + TS. Root cause: project-validator.ts hardcoded SOURCE_EXTS to {.ts, .tsx, .js, .jsx, .go} — the team's *own* stack — so .cs files weren't counted at all. TEST_NAME_PATTERN had the same problem: it only recognised foo.spec.ts / foo.test.ts naming, so xUnit *Tests.cs files weren't classified as tests, and the failing-test detection couldn't find the [Fact(Skip = ...)] marker the generator correctly emitted. The interview command takes role.stack as input from the hiring manager — the validator must not assume that stack matches the CLI's own stack. Changes: - SOURCE_EXTS expanded to cover JS/TS family (+ .vue, .svelte), JVM (.java, .kt, .scala), .NET (.cs, .fs, .vb), systems (.rs, .c, .cpp, .h), dynamic (.py, .rb, .php), Apple (.swift), and functional niches (.ex, .clj, .ml). - TEST_NAME_PATTERN extended to recognise *Tests.cs / *Specs.kt / foo_test.go / test_foo.py / foo_spec.rb in addition to the JS/TS *.spec.* / *.test.* convention. - Skipped-test detector extended with xUnit/NUnit [Fact(Skip=)] / [Ignore], JUnit @disabled / @ignore, pytest / unittest @Skip, RSpec pending, Rust #[ignore], and a universal NotImplementedException / raise NotImplementedError catchall. - Two new regression tests: one for a C# project that lands in window via .cs + *Tests.cs + [Fact(Skip=)], one for a Python+Vue project that lands via .vue + .py + @pytest.mark.skip. Verified end-to-end: gpt-5 first-attempt success on the principal-full-stack role generating a 447-LOC C#/.NET + Nuxt project with idiomatic xUnit skip marker — previously failed all 3 attempts because the validator couldn't see the files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… link The interactive wizard already showed an OSC 8 hyperlink to the generated repo and offered a GitHub publish form after success (tui/interview_bootstrap_publish.go + the result screen in interview_bootstrap_generate.go). The headless path (`teamhero interview bootstrap --headless ...`), used by scripts and proctors invoking the CLI directly, dropped the user at the bun script's plain "Output: <dir>" line with no link and no publish opportunity — so a proctor running the supported headless flow had to copy-paste the path, then drive `gh repo create` + `git push` by hand every time. Wires `runInterviewBootstrap` through the same post-success UX: - On exit==0, print `Project: <abs-path>` wrapped in an OSC 8 file:// hyperlink (ctrl-click opens the dir in Finder/Explorer). - If stdin is a TTY and --no-confirm is absent, invoke the existing offerPublishToGitHub flow. CI runs (non-TTY) and `--no-confirm` scripted runs stay silent — no surprise prompts. - offerPublishToGitHub already reads GITHUB_PERSONAL_ACCESS_TOKEN from ~/.config/teamhero/.env (the token `teamhero setup` saves), prints a "run teamhero setup" tip when it's missing rather than nagging, and uses huh forms for repo name / owner / visibility (private default). The wizard and headless paths now share that identical flow — no behavior drift. Tests via the existing offerPublishToGitHub and isStdinTTY var seams: - PrintsSuccessLinkOnZeroExit — asserts the OSC 8 envelope on stdout. - OffersPublishWhenTTY — TTY + no --no-confirm -> publish ran. - SkipsPublishWhenNoConfirm — --no-confirm -> publish skipped. - SkipsPublishWhenNotTTY — non-TTY -> publish skipped. - NoPublishOnFailure — exit!=0 -> no link, no publish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The interview verb picker was rendering with only "> Cancel" visible on the first paint. The user had to press the up arrow before they could see Bootstrap / Review / Cohort. Root cause: the Cancel option's value was the empty string "". The picker bound a `var verb string` (zero value: "") to huh.Select. huh saw the bound value matched the Cancel option, marked Cancel as the "current value" of the field, and anchored the cursor on it (the LAST row). When the form rendered with a constrained viewport, it scrolled so the cursor stayed visible, clipping every option above it. Arrow-up moved the cursor and scrolled the viewport, revealing the hidden options — which is exactly the workaround the user reported. Fix: give Cancel a distinct sentinel value "cancel" so the bound verb's zero value matches nothing in the list. huh then defaults the cursor to the first option (Bootstrap) and all four rows paint on entry. interviewVerbPicker maps the "cancel" sentinel back to the caller's "" no-op convention before returning, so the dispatcher's existing `if verb == ""` check still covers both ctrl-c abort and explicit Cancel selection — no behavior change for downstream callers. Tests: - TestInterviewVerbOptions_NoValueMatchesZeroDefault — pins the bug by asserting no option has Value == "". Any future option whose value collides with `string`'s zero default would re-introduce the clipping, so the guard covers the whole list, not just Cancel. - TestInterviewVerbOptions_IncludesCancel — guards against a refactor that drops the Cancel row entirely (would force users to commit to a verb with no way out other than ctrl-c). - TestRunInterview_NoVerb_TTY_CancelSentinelTreatedAsNoOp — confirms the picker→dispatcher contract: Cancel still produces exit 0 with no error output, identical to ctrl-c abort. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…0-min time-box Two PRD-tracked defaults now apply automatically when their flags are omitted, so a typical headless run boils down to --role + --stack + --domain + --feature + --mode-* with no path or duration to remember. Explicit flags continue to win — the original PRD's per-invocation override capability is intact. Output directory: - Default: ./interviews/<role-slug> (was: required flag, no default in the headless path; the wizard used to seed ./roles/role). - .gitignore now lists both `interviews/` (new default) and `roles/` (kept for users upgrading from the old convention so their existing generated content stays gitignored). - Wizard form (interview_bootstrap_tea.go, interview_bootstrap_wizard.go) seeds with `./interviews/role` and auto-substitutes the typed role slug on the output-dir step. Time-box: - Default: 60 minutes (was: 90). - "60 minutes" is now the (recommended) label in the wizard's huh select; "90 minutes" loses the recommendation suffix. Standard options remain 60 / 90 / 120 / Custom. - bun script (scripts/run-interview-bootstrap.ts) falls back to "60" when --time-box is omitted, matching the Go TUI's default. Implementation: - New applyBootstrapDefaults(opts) in interview_bootstrap.go runs between flag parsing and validation. nil-safe so callers don't need to guard. Empty role leaves OutputDir empty so the validator's "missing --role" error surfaces instead of producing a misleading ./interviews/<empty> path. Tests (Go): - FillsOutputDirFromRole / DoesNotOverrideExplicitOutputDir / LeavesOutputDirEmptyWhenRoleMissing / FillsTimeBoxWhenMissing / DoesNotOverrideExplicitTimeBox / NilSafe — unit coverage for applyBootstrapDefaults. - AllowsOmittedOutputDirAndTimeBox — end-to-end through runInterviewBootstrap with both flags omitted; asserts the runner sees the resolved defaults. - TestBootstrapWizard_DefaultModelHasSensibleDefaults updated to expect "60" (was "90"). Snapshot golden files (4 in tui/testdata/interview_bootstrap/) regenerated via TEAMHERO_UPDATE_SCREENSHOTS=1 to reflect the new seeds (TimeBox: "60", OutputDir: "./interviews/senior-backend"). Layout assertions still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The confirm step rendered every collected wizard field a second time as the huh.Confirm form's Description — a long "role=… · stack=… · domain=… · time-box=… · project=… · analysis=… · rubric=… · out=…" line — even though the right-hand summary panel already lists each field in a structured, labelled format. Result: the only thing the user actually has to act on (Yes / Cancel) was buried under repeated information, and the reported screenshot showed multiple lines of config metadata pushing the buttons below the fold on shorter terminals. Remove the redundant description from the confirm form. The right- hand summary panel stays untouched as the canonical pre-generation recap. The summarizeBootstrapModel helper is kept (it's still useful for logging and the existing unit test) — it's just no longer wired into the confirm form. Tests: - TestInterviewBootstrap_ConfirmStep_OmitsVerboseSummary — drives the wizard to ibStepConfirm and asserts the rendered view contains "Ready to bootstrap?" and "Yes, generate the role" but NOT the summary signature tokens "role=", "stack=", "time-box=", "out=". Any regression that pipes summarizeBootstrapModel back into the Description will reintroduce those tokens and fail this assertion. - 03-confirm-step golden snapshots (ANSI + plaintext) regenerated via TEAMHERO_UPDATE_SCREENSHOTS=1 to lock in the cleaner layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…st pass Real wizard runs were failing with the validator complaint: Expected at least 2 deep modules (>=80 lines); found 1. LOC out of range: 247 lines of code; expected 400-700. Root cause: gpt-5-mini was consistently producing a thin happy-path sketch on attempt 1 — one deep module, ~200-300 total LOC — and only climbing into the 400-700 window on attempts 2-3 once the validator's specific failure list was fed back through previousFailures. With a 3-attempt budget, marginal cases ran out of retries before recovery. Two changes work together: 1. Stronger first-pass prompt (openai-generator-client.ts). The LOC bullet is now labeled "SIZE TARGET" with explicit framing as a "hard constraint", names ~525 as the target midpoint to leave variance margin, and provides per-module guidance ("100-150 lines per deep module"). It also names a typical decomposition (types module + service module + helpers) so the model has a concrete shape to aim at instead of guessing. The "Do NOT shrink to a minimal happy path" line addresses the observed failure mode directly. 2. Default attempt budget raised from 3 to 5 (project-generator.ts). One additional retry was reliably enough to recover marginal cases in manual testing; bumping to 5 leaves safety margin without runaway cost (each attempt is one structured Responses API call). Tests: - openai-generator-client.spec.ts: new assertion that the prompt contains the SIZE TARGET / hard constraint / 100-150 lines per deep module phrasing. Any future prompt edit that loses the per-module sizing will re-introduce the first-attempt undersize bug and trip this regression guard. - project-generator.spec.ts: existing 3-attempt assertion updated to reflect the 5-attempt default. The "succeeds within budget" test still locks in mid-budget recovery (succeeds on attempt 3 of 5); the "exhausts budget" test now expects attempts === 5 and is rewritten with a single-malformed clientReturning (clamps to last) so adding attempts in the future doesn't churn the fixture list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…reak Reported: in the rubric-mode step (and similar select screens) the left vertical bar (huh.ThemeCharm's focus-border decoration) renders correctly on the title line but is missing on wrapped Description lines, producing a visible break in the form's bounding rail. Root cause is upstream: huh's Charm theme doesn't extend the focused- field left bar onto Description content that wraps via huh's internal wrap path. We can't fully fix it from our side until the Charm libraries (huh / lipgloss / bubbletea) ship a fix. Pragmatic mitigation for the static-description case: shorten the Analysis-mode and Rubric-mode Descriptions so they fit on one line at the default formWidth (3/5 of an 80-col terminal = 48 chars) and never enter huh's wrap path. The full detail that used to live in the Description is moved into the option labels themselves, which render correctly even when long. Limitations: - The dynamic-content case still applies: the Feature-description textarea and Custom-rubric-prompt textarea wrap user-typed content via huh's path, and the bar-break recurs there. A separate change upgrading huh / lipgloss / bubbletea is needed to address that end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reported requirement: the interview kit's scaffolding (start.sh, end.sh, INTERVIEW_RULES.md, AGENTS.md, PRIVACY_RELEASE.md, RUBRIC_OVERVIEW.md, .claude/CLAUDE.md, lib/privacy-gate.sh, etc.) should always end up in the generated repo, regardless of whether the proctor picked a generated starter project (Mode A) or a brief-only flow (Mode B). Existing gap: `--kit-dir` was an explicit flag with no default. If the proctor invoked headless without passing it, the bootstrap skipped the kit copy entirely (project-generator.ts only runs copyDir when `kitTemplateDir` is supplied). The AI-generated project landed in the output directory but none of the proctor/candidate guidance shipped alongside it — the recording workflow that depends on start.sh / end.sh / PRIVACY_RELEASE.md was effectively broken. applyBootstrapDefaults now sets KitDir to "teamhero-interview-kit" (resolved relative to the current working directory, same lookup as findBootstrapScript in interview_bootstrap.go) whenever the flag is omitted. Explicit --kit-dir still wins, so proctors with a custom kit layout can override. Tests: - FillsKitDirSoScaffoldingAlwaysShipsToCandidate — bare opts (only Role set) get KitDir = "teamhero-interview-kit" after defaults run. - DoesNotOverrideExplicitKitDir — explicit "/opt/custom-kit" wins through the defaulting pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ug logs The wizard had three redundant ways to describe the project: the Feature text input, a late-stage "How should the AI prompt be supplied?" select, and an optional "Project prompt (optional)" addendum. The user pointed out — correctly — that the Feature description IS the project prompt, so the late-stage steps were just asking the proctor to say the same thing twice. The new flow has one either/or decision right after Domain: "I'll write the description myself" → existing Feature text input "Suggest project ideas for me" → spinner → pick-one Both paths populate data.feature directly; the projectPrompt field, --project-prompt flag, and corresponding OpenAI addendum block are all gone. Added structured run logging so future generation failures are triageable without a rerun. The dispatcher prints one line of run context to stderr (role, mode, stack, domain, time-box, rubric, output, debug), and the bun subprocess emits bootstrap.start / bootstrap.ok / bootstrap.fail records with elapsed-ms and attempt counts. The --debug flag adds truncated feature/custom-prompt bodies and output/kit paths. Strengthened the OpenAI prompt to push the model past its undersize bias. The size requirements now lead the Mode A spec, name 4-5 concrete files with line ranges, and on retry the prompt quotes the prior LOC count and demands "roughly DOUBLE the previous output" — gpt-5-mini was otherwise content to produce another 266-LOC sketch even after the validator flagged it. Smoke tests show first-attempt success at 399 LOC and three-attempt recovery at 490 LOC with no human intervention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ptions The wizard's "Project mode" picker exposed two opaque codes (A/B). Replace with three self-describing options that match how the proctor actually thinks about the interview shape: Brownfield — AI scaffolds a starter codebase Greenfield (use the stack above) — written brief; candidate uses your stack Greenfield (candidate picks stack) — written brief; candidate also picks tooling The first two collapse to the existing mode A/B at the BootstrapOptions boundary. The third is a new variant: mode B + a stackByCandidate flag that flips the BRIEF.md's "Tech stack" section from a hard requirement ("REQUIRES the candidate to use Go") to a contextual hint ("candidate selects their own tech stack ... Go is referenced as the team's internal tooling"). When stackByCandidate is set, the stack choice itself becomes part of what's being evaluated. Renamed "Rubric mode" to "How should AI share observations?" so the question signals what the answer actually controls (the rubric the AI uses when writing up the recorded interview). Pure label change — same three options, same internal values. Every wizard question now carries a one-line description telling the hiring manager how their answer shapes the generated project: stack drives the source language, domain shapes glossary vocabulary, feature description becomes the project's central focus, etc. Headless protocol: new --stack-by-candidate flag, valid only with --mode-project B. Mode A + stack-by-candidate is rejected by both the Go validator and TS role-config validator because "candidate picks the stack" is incoherent when the AI is generating a starter codebase IN a specific stack. Backward compat: legacy A/B values in BootstrapWizardDefaults still seed the wizard correctly (normalizeWizardProjectMode maps them) so saved role-config.json files don't break. Tests added: - Project type taxonomy → (mode, stackByCandidate) resolver (Go) - Legacy A/B value normalization (Go) - Greenfield-open round-trip through bootstrapWizardOptionsFromModel (Go) - --stack-by-candidate flag parsing (Go) - Validator rejection of stack-by-candidate + mode A (Go + TS) - OpenAI prompt branches: B alone vs B + stackByCandidate (TS) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ment The validator's 400-700 LOC and 2-deep-module rules were a heuristic inherited from the initial slice — they weren't in any product spec. They were producing real friction: perfectly serviceable 300-LOC projects were being rejected and retried, and the AI prompt was laden with size enforcement language that crowded out the actual requirements. Removed: - MODE_A_LOC_MIN / MODE_A_LOC_MAX / DEEP_MODULE_MIN_LINES / MODE_A_MIN_DEEP_MODULES constants and the validator branches that used them. - countLines() helper, SOURCE_EXTS set, extname import — all dead now. - ABSOLUTE SIZE REQUIREMENTS block and the per-file budget table from the OpenAI Mode A prompt. - extractLocFromFailure / extractDeepModuleCount retry-feedback helpers and the "CORRECTION REQUIRED — DOUBLE the previous output" retry directive (no failure type to extract from now). Kept: - README.md, GLOSSARY.md, and at-least-one-failing-test structural checks. These are real product requirements and shape what the candidate sees on minute zero. - Polyglot test-file detection (xUnit Skip, pytest.mark.skip, t.Skip, etc.) — the structural check still needs it. - Retry loop, now reverted from 5 attempts back to 3 since structural failures are rare and self-correcting. Tests: - Removed the three size-rejection tests in project-validator.spec.ts. - Added a regression guard that pins the new contract: a tiny but structurally-complete project passes validation. - Updated openai-generator-client.spec.ts: dropped the "encodes hard LOC band" test; added a "does NOT encode hard LOC band" regression guard so a future prompt edit can't silently reintroduce the rule without the matching validator coming back too. - Simplified the stubModeAProject() fixture — no more LOC padding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…candidate output The candidate-facing repo was leaking hints about the answer: - GLOSSARY.md listed domain concepts the candidate is supposed to identify themselves as part of the evaluation. - Pre-existing failing/skipped tests (e.g. describe.skip("addUser", ...)) gave away the API surface and function names the candidate should design. - The kit-overlaid .claude/CLAUDE.md literally told the candidate's agent to "read GLOSSARY.md and failing tests first" — coaching about the structure of the work before the candidate even read it. All three are now removed from what ships to the candidate: - project-validator no longer requires GLOSSARY.md or a failing test. README.md is the only required structural file (the candidate-facing brief — they need that to know what they're building). - OpenAI prompt now explicitly forbids generating test files, GLOSSARY.md, CLAUDE.md, or AGENTS.md under a new "DO NOT GENERATE" block. The README's "Getting started" section instructs the candidate to write their own tests. - teamhero-interview-kit/.claude/CLAUDE.md deleted. The kit still ships .claude/settings.json because that's the hook config that writes interview.log for the AI observer (non-optional plumbing). Verified end-to-end: a fresh smoke run produces only README.md + source files on the AI-generated side, plus the proctor-facing kit overlay (AGENTS.md, INTERVIEW_RULES.md, PRIVACY_RELEASE.md, RUBRIC_OVERVIEW.md, start.sh/end.sh). No tests/, no GLOSSARY.md, no CLAUDE.md. Tests updated: - project-validator.spec.ts: dropped failing-test and GLOSSARY required checks. Added regression guards pinning the new contract — a project with only a README.md must pass. - openai-generator-client.spec.ts: added assertions that the prompt forbids generating test files and GLOSSARY.md. Existing CLAUDE.md/ AGENTS.md regex updated for the new multi-line "DO NOT GENERATE" block shape. - project-generator.spec.ts: simplified stubModeAProject() — no GLOSSARY, no sample test file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… toggle The JD was previously tied to the "default+jd" rubric value — collected only as a side-effect of picking that rubric, and used only by the AI observer. Two problems with that: 1. The proctor couldn't combine a custom rubric WITH a JD. The rubric choice and the JD attachment are independent decisions; coupling them forced a false trade-off. 2. The JD was never read by the project-generation prompt. A junior healthtech JD should be able to nudge the AI toward an EHR-flavoured feature; instead it was inert until post-interview review. This change splits the JD into three standalone wizard steps right after Domain: "Will you provide a job description?" (yes/no) ↓ if yes "Path to job description file" ↓ "Should the JD influence the project being generated?" (yes/no) The rubric question simplifies to default/custom — the "default+jd" value is retired (validator rejects it with a clear error so stale configs surface). Behavior changes: - ai-observer reads the JD whenever jdPath is set, regardless of rubric mode. The custom rubric and the JD can now coexist; both get appended to the observer's instructions. - openai-generator-client now reads the JD body and injects it as a "Job description context" block in the project-generation prompt when jdInfluencesProject is true. The model uses it to calibrate complexity and domain. When the flag is off (or no JD), the prompt is unchanged. - New headless flag --jd-influences-project; rejected without --jd-path. - BootstrapOptions and RoleConfig grow a JDInfluencesProject / jdInfluencesProject field. Validator rejects influence=true without a path on both sides. Smoke verified: a JD that mentions FHIR/HL7/EHR concepts produced a generated repo with patient-data domain types and a dashboard-summary service. Without --jd-influences-project, the same role config produced a generic dashboard. PRD updated (docs/2026-05-09-candidate-interview-reviewer-plan.md) with a new "v2 — post-shipping refinements" section that catalogs all the changes shipped in this run (wizard rendering fixes, sensible defaults, kit-completeness, headless UX, --debug logging, wizard simplification, question descriptions, greenfield/brownfield taxonomy, size validator removal, candidate-facing strip of GLOSSARY/sample-tests/CLAUDE.md, and this JD refactor) — so the feature's evolution is visible without spelunking git log. Tests: - Go: new state-machine tests for jd-provided yes/no branch, jd-path → jd-influences-project transition, and rubric default → output-dir (no longer detours via jd-path). Retired TestBootstrapWizard_NextState_JDRubricRoutesToFilePicker and TestBootstrapWizard_NextState_JDPathThenOutputDir. - Go: validator tests for retired "default+jd" rejection, standalone jd-path acceptance, and jd-influences-project-without-path rejection. - TS: new openai-generator-client tests covering JD injection when jdInfluencesProject is true AND omission when false. Retired the "default+jd" coupling tests in role-config.spec.ts; added pairwise tests for standalone jd-path and jd-influences-project validation. - TS: ai-observer test renamed to assert JD is included regardless of rubric mode; new test that custom rubric + JD combine in the observer's instructions. - Snapshots regenerated for the new summary panel rows ("JD attached" and "JD usage"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

consola's default level is 3 (info); .debug() output is hidden. When the proctor passed --debug the headline log line printed (consola.info) but the bootstrap.debug.feature / output-dir / kit-dir lines were swallowed, which was exactly the troubleshooting context --debug exists to surface. Lift consola.level to 4 inside the --debug branch so all the existing consola.debug calls below print without re-checking the flag at each site. One-line fix, verified against a Stratus principal-engineer JD smoke run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…raming Two pieces of UX feedback in one commit because they intersect in the same step ordering. (1) The JD describes the business domain — so asking for Domain BEFORE the JD made the proctor either type the domain twice or type something generic that contradicted the JD. The wizard's step order is now: Stack → JD provided? → [JD path] → [JD influences project?] → Domain (skipped if JD) → Feature source → ... With a JD attached, the Domain step is skipped entirely; the role config carries an empty domain field and the OpenAI prompt renders a "Domain: infer from the job description context below" instruction instead of a bare "Domain: .". Validators on both Go and TS sides accept an empty domain when jdPath is set. The bun script's pre-validator picked up the same rule. (2) The default rubric option label said "9 built-in dimensions" — opaque framing for the hiring manager. The 9 dimensions are *what* the rubric observes, not *why* it observes them. The new label and description say what's actually being looked for: sound engineering practices (domain-driven design, deep modules, verification discipline) applied to AI-assisted work. Also fixed: the analysis-mode step said "drafts post-interview notes" — corrected to "post-interview observations" to align with the project's observation-over-judgment ethical framing. Tests: - Go state-machine: new tests for Stack→JD-provided, JD-influence→ feature-source (skipping Domain), JD-not-provided→Domain, and Domain→feature-source. Stale Domain→JD-provided test retired. - Go validator: new pairwise tests for domain-optional-with-JD AND domain-required-without-JD. - TS role-config validator: same pairwise coverage at the role-config layer (empty domain + JD ok; empty domain + no JD rejected). - TS openai-generator-client: new tests pinning the prompt rendering for both branches — "Domain: <X>." with a proctor-supplied domain and "Domain: infer from the job description" when domain is empty. - Summary panel: JD rows now precede Domain (mirroring the wizard flow); Domain row shows "(from JD)" when the proctor took the JD branch. - Snapshots regenerated. Smoke verified: a Stratus Principal Full Stack JD attached with NO --domain produced "Agentic Work Order Planner" — a work-order review workflow with tool-call loop and human-in-the-loop checkpoint, domain visibly inferred from the JD without the proctor typing it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…st cut 60x Six issues from a proctor's manual run of the interactive wizard: 1) Mode B wizard run shipped only BRIEF.md + role-config.json. Cause: the wizard path never called applyBootstrapDefaults, so KitDir was empty and the bun subprocess copied zero kit templates. Fix calls defaults right after the wizard returns. The kit overlay (INTERVIEW_RULES.md, AGENTS.md, PRIVACY_RELEASE.md, RUBRIC_OVERVIEW.md, start.sh / end.sh, .claude/settings.json) now ships on every run regardless of dispatch path. 2) Mode B output had no index.html for the candidate to open. The orchestrator now writes a minimal static index.html for Mode B runs that links to BRIEF.md and says "you choose the stack" — a landing pad without prescribing tooling. Skipped for Mode A (which already has source files) and skipped when the AI's own output happens to include an index.html (don't clobber). 3) A single Mode B run cost $1.36 (143k tokens). Two changes brought it to ~$0.02 (2.3k tokens) — a 60x reduction: - reasoning effort dropped from "medium" (default) to "low". gpt-5 models burn 100k+ reasoning tokens on a structured-output task at medium; low produces equivalent scaffolds for this work. Override via AI_REASONING_EFFORT env var. - Removed the full 9-dimension rubric block from the project-generation prompt. The rubric is review-side context; the generator only needs to know to leave decision points for the candidate. ai-observer.ts still gets the full rubric where it actually scores. Saves ~600 input tokens per call. 4) TUI didn't show which LLM was being used. The generate screen now displays "Model: <name>" alongside the spinner so the proctor sees what's about to be billed before they commit. Each Responses API call logs an openai.usage line (input / output / reasoning / total tokens) so cost is visible per attempt. 5) Success-link display showed the project path from the filesystem root. printBootstrapSuccessLink now uses filepath.Rel(cwd, abs) for the label — running teamhero from ~/Documents and writing into ./interviews/foo shows as "interviews/foo" rather than the full absolute path. The OSC 8 hyperlink URL is still absolute so the click actually opens the right place. Falls back to absolute when the output is outside cwd (Rel returns ".." in that case). 6) The `teamhero interview` no-args verb picker rendered as a bare huh.Select — no shell header, no hints footer, breaking visual parity with the rest of the app. The picker now prints the shared "//// TEAM HERO" header above and the "enter continue • ctrl+c quit" hints below, matching wizard.go. Tests: - Go: wizard-path-applies-KitDir-default (the original bug). - Go: relative-path display (label is cwd-relative, OSC 8 URL is absolute). - TS: Mode B orchestrator writes index.html stub; Mode A doesn't; AI-authored index.html is preserved. - TS: prompt no longer inlines the rubric; replacement one-liner pinned by content. Token-cost regression guard. Smoke verified: a Stratus principal-engineer Mode B run with JD influence used 2,315 total tokens (was 143,697 before this change) and produced AGENTS.md, INTERVIEW_RULES.md, PRIVACY_RELEASE.md, RUBRIC_OVERVIEW.md, start.sh, end.sh, lib/privacy-gate.sh, .claude/settings.json, BRIEF.md, role-config.json, and index.html in /tmp/stratus-v3 on attempt 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds two orthogonal flags so another agent can drive the bootstrap and consume the result programmatically: --json Emit a JSON payload to stdout describing the run. Human-readable output (Project: link, bun subprocess logs, publish messages) is routed to stderr so stdout stays parseable. The orchestrating agent reads stdout and acts on the payload — schedule interviews, send HR notifications, etc. --publish Auto-publish the generated repo to GitHub. No prompt. If a GitHub token is configured (via `teamhero setup`), the repo is created and the tree pushed. With --json, the resulting URL lands in payload.github.url so the downstream agent can put it directly in an email. Either flag can be true independently; both can be true together. Payload schema (versioned at "teamhero.interview.bootstrap/v1"): { "schema": "teamhero.interview.bootstrap/v1", "role": { "slug", "title", "stack", "domain" }, "project": { "mode", "stackByCandidate", "outputDir", "timeBoxMinutes", "feature", "analysisMode", "rubricMode", "jd": { "path", "influencesProject" } }, "ai": { "model" }, "github": null | { "url" } } Field discipline: - outputDir is always absolute so the agent doesn't have to know what cwd the command ran from. - jd is null when no JD was attached; payload-bearing fields when one is. - github is null unless --publish ran AND succeeded; an agent can json-null-check this to decide whether to email a link. - ai.model echoes AI_MODEL env or the gpt-5-mini default for cost attribution. Refactors along the way: - Threaded os.Stdout and os.Stderr separately through main.go -> runInterview -> runInterviewBootstrap{,WithWizard}. They were collapsed to a single writer at the CLI entry point, which is why the existing "stdout"/"stderr" naming in dispatcher code was cosmetic until now — both ended up at os.Stderr. - Split publishToGitHub() out of offerPublishToGitHub() so the new autoPublishToGitHub var (the --publish entry point) reuses the exact API + git plumbing without a confirm prompt. TDD: written red → green for each of: - --json flag parsing - --publish flag parsing - payload contents (role/project/ai/jd fields) - stdout vs stderr routing (human output to stderr when --json) - --publish + --json combination puts URL in payload - --publish without --json doesn't emit a JSON envelope - No --json preserves the existing human stdout behavior Smoke verified end-to-end: `teamhero interview bootstrap --json --jd-influences-project ...` produces the schema-v1 JSON on stdout, pipes cleanly to jq, and leaves human-readable logs visible on stderr for a watching operator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

asabaylus and others added 11 commits May 10, 2026 13:04

docs: add candidate interview grader plan

e84dd9e

Captures the design for a candidate screening kit + `teamhero interview` namespace with absolute scoring and mandatory human-in-the-loop review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

asabaylus self-assigned this May 11, 2026

asabaylus added the enhancement New feature or request label May 11, 2026

asabaylus changed the title ~~feat(interview): candidate AI-collaboration interview grader (all 5 slices)~~ feat(interview): Candidate AI Code Interview tool (all 5 slices) May 11, 2026

coderabbitai Bot requested changes May 11, 2026

View reviewed changes

Comment thread src/services/interview/review/review-orchestrator.ts Outdated

Comment thread src/services/interview/bootstrap/openai-generator-client.ts

Comment thread src/services/interview/bootstrap/project-generator.ts

Comment thread src/services/interview/bootstrap/project-generator.ts

asabaylus and others added 3 commits May 10, 2026 22:38

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

asabaylus and others added 2 commits May 10, 2026 23:13

📝 CodeRabbit Chat: Add generated unit tests

cf8229b

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread tests/unit/services/interview/bootstrap/project-generator-security.spec.ts

Comment thread tests/unit/services/interview/bootstrap/role-config-edge-cases.spec.ts Outdated

asabaylus and others added 4 commits May 10, 2026 23:23

coderabbitai Bot approved these changes May 11, 2026

View reviewed changes

asabaylus and others added 19 commits May 13, 2026 00:09

Merge remote-tracking branch 'origin/main' into claude/slice-1-founda…

6eb3ae6

…tion

Conversation

asabaylus commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works end-to-end

What ships in this PR

Ethical commitments — preserved end-to-end in code, schema, and docs

Manual QA Instructions

What to verify at each step

Optional deeper verification

Candidate-facing kit (independent of the audit flow)

References

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 11, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 11, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

asabaylus commented May 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading