Skip to content

feat(interview): Candidate AI Code Interview tool (all 5 slices)#10

Open
asabaylus wants to merge 43 commits into
mainfrom
claude/slice-1-foundation
Open

feat(interview): Candidate AI Code Interview tool (all 5 slices)#10
asabaylus wants to merge 43 commits into
mainfrom
claude/slice-1-foundation

Conversation

@asabaylus
Copy link
Copy Markdown
Owner

@asabaylus asabaylus commented May 11, 2026

Summary

teamhero interview is a candidate-facing AI-coding take-home experience, not a grading tool. The feature treats the interview as a project the candidate works on with whatever AI tooling they prefer, in a time-box, on their own machine — and only then, as a separate step, produces an advisory audit of how they collaborated with the AI.

The framing matters. The product is the candidate experience: a fair, modern, native generative-AI coding situation. The audit is a side effect that helps the hiring manager remember and reason about the session — it is never a score, never a ranking, and never determinative.

How it works end-to-end

  1. Hiring manager bootstraps a role. teamhero interview bootstrap (interactive wizard, or headless flags for CI) generates a starter project tailored to the role's stack, domain, and a one-sentence feature spec. The output is a real working project with a failing test, a CLAUDE.md, a GLOSSARY.md, and 2+ deep modules — sized to fit a 60/90/120-minute window. See docs/interview-classification-rationale.md for the full pedagogical and ethical framing.

  2. Candidate receives the kit and the project. Bundled with the project is teamhero-interview-kit/ containing:

    • INTERVIEW_RULES.md — what the candidate can and can't do (short version: use any AI tool you like).
    • RUBRIC_OVERVIEW.md — a plain-language explanation of the 9 observation dimensions, given to the candidate up-front. Nothing is hidden from them.
    • PRIVACY_RELEASE.md — explicit consent form: what is recorded, how it may and may not be used, and the appeal / deletion mechanism. The start.sh recording script will refuse to run until this is signed.
  3. Candidate works the project. They run start.sh, which starts an asciinema recording, hands them a shell, and lets them work as they normally would. Any AI tool is fair game — Claude Code, Cursor, Copilot, Cody, plain ChatGPT-in-a-browser. The cost of any AI usage they accumulate on their personal accounts during the interview is reimbursed via a $50 gift card so that tool choice stays an honest signal of preference and not a wallet-driven compromise.

  4. Candidate finishes. They run end.sh, which packages the recording + their commits into the repo for handoff.

  5. Hiring manager runs the audit. teamhero interview grade --candidate "Jane" --repo <url> produces a two-tier markdown audit:

    • summary.md — narrative observations across 9 dimensions, plus a mandatory manager sign-off section the manager must fill in with a categorical recommendation (Hire / Hire with notes / No hire) and reasoning ≥ 20 chars.
    • audit.md — the full reasoning chain, every raw measurement, the exact bias-guard prompt used.

    Every audit ships with an ADVISORY banner pinned above the content — "the audit is advisory; the candidate is a person, not a score; the audit is one factor among many." The audit is never returned to the candidate by default (GDPR Art. 15 caveat is documented for the EU/UK case).

  6. Hiring manager reviews the cohort. teamhero interview cohort --role <slug> rolls up all candidates for a role into a rankless listing with sign-off status — no Score, no Total, no Rank columns by design.

What ships in this PR

Layer Components
Candidate-facing teamhero-interview-kit/ — POSIX privacy gate, start.sh / end.sh, INTERVIEW_RULES.md, RUBRIC_OVERVIEW.md, PRIVACY_RELEASE.md, .claude/ templates
Hiring-manager-facing (TUI) Interactive huh wizard for bootstrap; phased progress display for grade (clone → collect-evidence → extract-measurements → observe → audit-write); glamour-rendered audit preview with pinned ADVISORY banner; shared interviewStyles palette
Engine (TS) Rubric module (9 dims, 2 groups, 3 evidence modes); project validator + generator (Mode A and B); role config; 4 evidence collectors + 4 deterministic extractors; AI observer with bias guard + strict scoreless schema; audit writer with sign-off validator; rankless cohort summary
Documentation docs/interview-rubric.md, docs/interview-classification-rationale.md, skills/teamhero-interview/SKILL.md (the conversational wrapper), README section 4

Ethical commitments — preserved end-to-end in code, schema, and docs

  • Observations, not scores. Strict OpenAI json_schema with additionalProperties: false, plus a second-layer rejectIfScored validator that throws if the parsed response contains any forbidden field.
  • Bias diversification, not elimination. The docs and skill never claim the AI is unbiased — only that its biases are different from any individual reviewer's.
  • Mandatory human-in-the-loop. Every audit requires a manager sign-off with a categorical recommendation and ≥ 20-char reasoning summary.
  • Interviewer-bias guard verbatim in every observation prompt (unit-tested).
  • Session recording URL never reaches the LLM — stored in frontmatter only.
  • Mandatory ADVISORY banner on summary, audit, cohort report, and at the start of every grade run.

Manual QA Instructions

A bundled smoke script walks through the candidate-experience and hiring-manager flows without spending money on OpenAI:

just build-all                                          # builds the Go TUI + TS engine
./scripts/manual-test-interview.sh                      # walks five interactive steps

The script requires a TTY (it bails on piped stdin) and uses --mode-analysis human-only for the grade step so no API key is needed.

What to verify at each step

  1. teamhero interview bootstrap with no flags drops into the wizard. Try each rubric mode: default (skips both follow-ups), custom (reveals a text area for the prompt), default+jd (reveals a file picker; validates the path exists). Press Ctrl+C from the first screen — it must exit cleanly with no partial output.

  2. Headless bootstrap is unchanged. The script then runs teamhero interview bootstrap --headless --role manual-test-role ... --output-dir /tmp/.... No TUI should render. Inspect the output directory:

    • CLAUDE.md and GLOSSARY.md present
    • 2+ deep modules in the generated source
    • A failing test (the candidate's starting point)
    • LOC of starter scaffold roughly 400–700
    • teamhero-interview-kit/ contents copied in
  3. teamhero interview grade with --mode-analysis human-only should show:

    • The ADVISORY banner at the start
    • A phased progress display: clone → collect-evidence → extract-measurements → observe → audit-write. (In human-only mode the observe phase is a near-instant no-op.)
    • A glamour-rendered preview of summary.md with the ADVISORY banner pinned at the top, before any candidate name
  4. Sign-off gating. Open the generated summary.md / audit.md. Confirm:

    • ADVISORY banner is at the top of both files
    • Sign-off section is present and requires a categorical recommendation
    • The audit.json contains no score / weighted_total / band fields
    • Session recording URL (if you passed one) appears in YAML frontmatter only, never in the body
  5. teamhero interview cohort --role <slug> should render a rankless listing — no Score, Total, or Rank columns.

Optional deeper verification

just test-all                  # 1660 TS pass / 0 fail; full Go suite green
just tui-test -cover           # confirm Go coverage stays ≥ 85% (currently 87.7%)

Candidate-facing kit (independent of the audit flow)

To verify the candidate experience without running a full grade:

cd /tmp/manual-test-role/teamhero-interview-kit/
./start.sh                     # should refuse — PRIVACY_RELEASE.md is unsigned
# Edit PRIVACY_RELEASE.md to add a name + date under "## Signed" / "## Date"
SKIP_RECORD=1 ./start.sh       # privacy gate now passes; asciinema is skipped in test mode

References

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Full "teamhero interview" flow: interactive bootstrap wizard or equivalent headless flags, headless review (phased progress display, glamour-rendered audit preview, mandatory ADVISORY banner), and cohort summary generation (COHORT.md).
  • Documentation

    • Expanded Quick Start, rubric, rationale, interview kit, privacy release, and skill guidance emphasizing observation-only outputs (no numeric scores) and required human sign-off.
  • Scripts

    • Add start/end session scripts and a manual test script to validate interview workflows and artifacts (summary/audit/cohort).

asabaylus and others added 11 commits May 10, 2026 13:04
Captures the design for a candidate screening kit + `teamhero interview`
namespace with absolute scoring and mandatory human-in-the-loop review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Establishes the rubric foundation for the candidate AI-collaboration
interview grader (slice 1). Ships a deep module exposing 9 dimensions
across two groups and three evidence modes:

- 6 process dimensions: upfront-design, context-engineering,
  critical-evaluation, verification, course-correction, risk-awareness
- 3 outcome dimensions: architectural-quality, test-pass, throughput
- 4 deterministic, 2 hybrid, 3 llm-judge evidence modes

Public API: RUBRIC_VERSION, getRubricVersion(), getDimensions(),
getDimension(id), getEvidenceMode(id). Tests cover version stamping,
dimension lookup, evidence-mode classification, group split, and
required-field population.

Refs: teamhero-scripts-3sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the `interview` verb to the TS CLI, mirroring the report/setup/doctor
pattern: forwards all args (including --help and verb names) to the Go TUI
binary. Verb-level routing (bootstrap, grade, cohort) lives in the Go TUI.

Refs: teamhero-scripts-3sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the Go TUI side of the interview command. Routes `interview <verb>`
to per-verb stubs (bootstrap, grade, cohort) that each print
"not yet implemented" and exit non-zero. Top-level usage and per-command
help routing extended to cover the new subcommand.

Refs: teamhero-scripts-3sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Establishes the wire format between the future TS interview service and
the Go TUI. Defines InterviewEvent as a discriminated union (starting
with `progress`), plus parse/serialize functions that round-trip cleanly,
emit one event per line, and reject malformed or unknown event types.

Future slices add `observation`, `measurement`, `audit`, `result`, and
`error` events to the union — additive by design, discriminated on type.

Refs: teamhero-scripts-3sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Publishes the two foundational docs for the interview grader:

- `interview-rubric.md` — formal rubric in observation framework. Lists
  all 9 dimensions across process/outcome groups, evidence modes
  (4 deterministic, 2 hybrid, 3 llm-judge), and per-dim observation
  vs. measurement output shape. No scoring levels anywhere.

- `interview-classification-rationale.md` — methodology and ethics
  doc with binding top-section preamble covering: (1) observations-
  not-scores, (2) bias diversification (NOT elimination — never claim
  the AI is "non-biased"), (3) mandatory human-in-the-loop with manager
  sign-off, (4) GDPR Article 15 caveat for MVP candidate-no-audit-access
  default. Per-dimension methodology, interviewer-bias guard, and
  schema-level scoring-drift guard follow the preamble.

Refs: teamhero-scripts-3sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…le config, kit scaffold

Implements the hiring manager's bootstrap user journey end-to-end (headless mode).

Deep modules:
- `project-validator` (Mode A: CLAUDE.md / GLOSSARY.md / 2+ deep modules /
  failing test / LOC 400-700; Mode B: BRIEF.md with time-box, acceptance,
  deliverables sections)
- `role-config` (typed schema, validator with rubric-mode-conditional
  requirements: custom -> non-empty customPrompt, default+jd -> existing
  jdPath; round-tripping persistence)
- `project-generator` (injectable GeneratorClient interface; retries up
  to 3x on validation failure; embeds the interview-kit template files
  into bootstrap output)
- `orchestrator.runBootstrap` glues the three: validates config, runs
  generation, writes role-config.json

Production wiring:
- `openai-generator-client.ts` calls the Responses API with strict
  json_schema requiring `files[].{path,content}`; prompt encodes role
  metadata, rubric, mode-A vs mode-B contract, and retry diagnostics
- `scripts/run-interview-bootstrap.ts` CLI entry the Go TUI spawns
- `tui/interview_bootstrap.go` parses headless flags (--role / --stack /
  --domain / --feature / --time-box / --mode-{project,analysis,rubric} /
  --output-dir / --jd-path / --custom-prompt / --kit-dir), validates them
  with the same rules as TS, and shells out to bun running the script

Embedded interview-kit placeholder templates (Slice 3 enriches):
- start.sh / end.sh (placeholder; Slice 3 implements privacy gate + asciinema)
- INTERVIEW_RULES.md, RUBRIC_OVERVIEW.md, PRIVACY_RELEASE.md (placeholders)
- .claude/settings.json (empty hooks; Slice 3 wires)
- .claude/CLAUDE.md (minimal candidate-agent framing)

Test coverage:
- Project validator: 11 cases covering Mode A positive + each negative
  failure (missing files, sprawl-only, no failing tests, LOC bounds) and
  Mode B positive + missing/empty/incomplete brief
- Role config: 12 cases covering field requirements, rubric-mode
  conditionals, time-box bounds, persistence round-trip, refusal to
  write invalid configs
- Project generator: 5 cases covering Mode A success, retry-then-success,
  exhaustion after 3 failures, kit template copy, Mode B success
- Orchestrator: 3 cases covering happy path, validation rejection without
  generator invocation, kit-dir merging
- Go: 10 cases covering flag parsing, validation rejection of incomplete
  / malformed flags, runner delegation, exit-code forwarding

Notes: --headless is the only Slice 2 path. Interactive `huh` wizard
deferred. `assess_config.go` referenced in acceptance criteria does not
exist on this branch (assess maturity work has not merged); follows the
existing `doctor.go` style instead.

Refs: teamhero-scripts-7qe

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ripts + templates)

Completes the candidate's user journey: clone, sign, code, push. The
kit lives at `teamhero-interview-kit/` in this repo and is copied
verbatim into every bootstrap output by Slice 2.

Privacy gate (POSIX-compatible deep module):
- `lib/privacy-gate.sh` exposes `check_privacy_release_signed PATH`
  (function) and runs the same check when invoked as a script. Detects
  missing file, empty file, missing/whitespace/placeholder values under
  the "## Signed" and "## Date" sections.

Shell entry points:
- `start.sh` — calls the privacy gate, refuses on unsigned with clear
  instructions, verifies UserPromptSubmit + PreToolUse hooks present in
  .claude/settings.json, writes .interview-state/started-at, and execs
  asciinema rec terminal.cast. Honors SKIP_RECORD=1 + ASCIINEMA_BIN env
  for testability.
- `end.sh` — verifies start was run, stages PRIVACY_RELEASE.md +
  terminal.cast + interview.log + INTERVIEW_RULES.md, and creates a
  single git commit. Honors SKIP_COMMIT=1 for testability.

Templates filled in with real content:
- `PRIVACY_RELEASE.md` — full placeholder release with REVIEW WITH LEGAL
  warning, no-training clause, deletion/access/appeal rights (30-day
  appeal window after hiring decision), bias acknowledgement,
  what-the-AI-sees and what-it-doesn't-see sections.
- `INTERVIEW_RULES.md` — candidate-facing rules with `{{TIME_BOX}}`
  placeholder, before/during/finishing sections, and WSL setup
  instructions for Windows candidates (apt install + clone under ~/
  not /mnt/c/).
- `RUBRIC_OVERVIEW.md` — plain-language summary of all 9 dimensions
  (process × 6, outcome × 3) including the "why no scoring" framing.
- `.claude/settings.json` — UserPromptSubmit + PreToolUse hooks each
  pipe a structured JSONL line to `interview.log` via jq.
- `.claude/CLAUDE.md` — candidate-agent framing covering role, what to
  do with knowledge of being observed (be normal, do not coach toward
  rubric), and what not to do (no silent destructive ops, no external
  exfil).

Tests (17 new):
- Privacy gate (9 cases): signed, missing file, empty file, placeholder-
  only, signature-no-date, date-no-signature, no path argument, missing
  sections, whitespace-padded real values.
- Kit smoke (8 cases): start refuses unsigned; start passes when signed
  (SKIP_RECORD); settings.json declares both hooks emitting to
  interview.log; end refuses without prior start; full start→end
  round-trip (SKIP modes); INTERVIEW_RULES mentions WSL; RUBRIC_OVERVIEW
  mentions all 9 dims; PRIVACY_RELEASE has REVIEW WITH LEGAL +
  no-training + appeal + 30-day language.

Refs: teamhero-scripts-h13

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er, audit writer)

Largest slice. Produces a complete per-candidate audit end-to-end:
inputs → events → measurements + observations → summary.md / audit.md /
audit.json / evidence/. Headless mode is wired through the Go TUI.

Evidence collectors (4 input adapters):
- `collectors/asciinema.ts` — parses v2 cast files; reconstructs typed
  commands from input events with pause-before-Enter timing; handles
  backspace.
- `collectors/jsonl-log.ts` — parses interview.log (kit-emitted JSONL);
  separates prompts from tool-use; silently drops malformed lines.
- `collectors/transcript.ts` — supports [HH:MM:SS] Speaker: text format
  (Granola/Otter/Fireflies markdown export shape) and bare Speaker:
  lines; skips VTT timing lines and comment lines.
- `collectors/git-history.ts` — parses `git log` via an injectable
  runner; tolerates non-git directories by returning empty list.

Deterministic extractors (4 pure functions):
- `extractors/verification.ts` — counts test/typecheck runs across
  bun/npm/pnpm/yarn/go/pytest/jest/vitest/cargo/just, plus
  diff/grep/cat reads, plus test-runs-immediately-after-prompt.
- `extractors/risk-awareness.ts` — pattern-matches 13 destructive
  command families; reports each with pause-before-Enter timing.
- `extractors/test-pass.ts` — injectable TestRunner; default runner
  picks bun vs go from project layout.
- `extractors/throughput.ts` — session start/end, elapsed, commits,
  time-to-first-test, avg gap between commits.

AI observer (deep module with explicit ethical guards):
- `INTERVIEWER_BIAS_GUARD` constant — verbatim instruction told to
  every observation prompt to treat interviewer commentary as
  situational context, not evidence about the candidate.
- `OBSERVATION_RESPONSE_SCHEMA` — strict json_schema with
  additionalProperties: false on every level; declares ONLY
  dimension_id / observation / reasoning / evidence_excerpts /
  caveats. No score, weighted_total, raw_total, band, signal_count.
- `rejectIfScored` — second-layer validator; throws if the parsed
  response contains any forbidden field (defense in depth: schema
  + post-parse check + writer renders nothing numerical).
- `buildObserverPrompt` — pure function; supports default / custom /
  default+jd rubric modes; reads JD from disk; embeds rubric for the
  5 observable dims only (deterministic dims are extracted, not
  observed); never includes the session_recording_url.
- `humanOnlyObservations` — returns blank fillable templates for
  human-only analysis mode.
- `OpenAIObserverClient` — production client; Responses API; gpt-5-mini
  default; the rejectIfScored guard runs on every response.

Audit writer (pure transformation):
- `renderSummary` / `renderAudit` — markdown rendering; same warning
  banner at top of both ("THIS AUDIT IS ADVISORY ... not a score");
  YAML frontmatter with Obsidian-friendly tags + candidate + role +
  date + rubric_version + rubric_mode + signed_off +
  session_recording_url + session_platform + session_date;
  reasoning chain preserved in BOTH tiers (transparency).
- Per-dimension layout: measurements first (for deterministic / hybrid
  dims), then observation + reasoning + evidence (for LLM-judge /
  hybrid dims).
- `validateSignOff` — categorical recommendation (Hire / Hire with
  notes / No hire) plus manager-written reasoning >=20 chars; the TUI
  uses this to reject empty submissions.

Grade orchestrator:
- `gradeCandidate` — clones repo (injectable cloner) or uses
  localRepoPath; reads role-config.json from the repo; runs all 4
  collectors + 4 extractors; calls AI observer (or human-only path);
  writes audit; copies PRIVACY_RELEASE.md, terminal.cast,
  interview.log, transcript, interviewer-notes into evidence/.
  Cleans up the temp clone on exit.
- `scripts/run-interview-grade.ts` — CLI entry the Go TUI spawns.

Go TUI:
- `tui/interview_grade.go` — headless flag parser (positional repo or
  --repo, plus --candidate, --transcript, --interviewer-notes,
  --session-recording-url, --session-platform, --session-date,
  --output-dir, --local-repo-path) with validation; mandatory
  ADVISORY warning banner printed before spawning the TS subprocess.
- `runInterview` dispatcher wires grade to runInterviewGrade.

Tests (50 new):
- Collectors: 10 (asciinema header+commands+backspace, interview.log
  split + malformed-line tolerance, transcript timestamped + bare +
  VTT, git history with stub runner, git failure tolerance).
- Extractors: 9 (verification counts + interleaving; risk-awareness
  zero-detection / rm -rf with pause / force-push and reset;
  test-pass with injected runner including n/a case; throughput
  elapsed + commits + time-to-first-test + empty).
- AI observer: 12 (bias guard verbatim; role metadata in input;
  rubric for 5 observable dims only; custom prompt mode; JD content
  mode; session URL NOT in prompt; interviewer notes routed to user
  input not instructions; schema has no score/total/band fields;
  rejectIfScored on score / weighted_total / clean; human-only
  template count and content).
- Audit writer: 15 (summary frontmatter, banner, reasoning preserved,
  measurements rendered, sign-off section with Hire|notes|No hire,
  session_recording_url in frontmatter; audit banner; audit reasoning
  preserved; audit evidence not truncated; writeAudit produces all 4
  outputs; validateSignOff rejects non-categorical; rejects empty
  reasoning; rejects too-short reasoning; accepts substantive
  Hire/notes/No-hire).
- Grade orchestrator: 4 (full happy path produces all outputs +
  PRIVACY_RELEASE copied; missing role-config errors; human-only
  skips observer; session_recording_url in frontmatter NOT in prose).
- Go: 8 (grade flag parsing positional / all-flags; validation rejects
  missing candidate / missing repo / bad platform; warning banner
  printed; exit code forwarded).

`just test`: 1632 pass. `go test`: all green.

Refs: teamhero-scripts-bt2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes the candidate-interview-grader MVP. Hiring manager can now
roll up the cohort and orchestrate the whole flow conversationally.

Cohort summary module (deep, pure):
- `cohort-summary.ts`:
  - `loadCohort(roleDir)` reads every `<candidate>/audit.json`, returns
    typed records; skips malformed JSON silently.
  - `renderCohortSummary` produces COHORT.md as a markdown table with
    columns: Candidate | Interviewed | Sign-off | Recommendation |
    Audit. NO score, total, or rank column. Default order is
    alphabetical; chronological is opt-in.
  - Mandatory ADVISORY warning banner at the top.
  - Pending sign-offs marked ⏳; reviewed marked ✅.
  - Header line reports pending vs reviewed counts.
  - Friendly "no candidates yet" copy for empty roles.
- `writeCohortSummary` writes the result to `<roleDir>/COHORT.md`.

Audit frontmatter extension:
- `AuditFrontmatter.recommendation` (optional, categorical) added so the
  cohort renderer can show the manager's sign-off choice without
  re-parsing the markdown.

Production wiring:
- `scripts/run-interview-cohort.ts` — CLI entry the Go TUI spawns.
- `tui/interview_cohort.go` — headless flag parser (--role / --role-dir /
  --order) + validation + bun subprocess runner.
- `runInterview` dispatcher routes cohort → runInterviewCohort.

Claude skill — bounded-context wrapper:
- `skills/teamhero-interview/SKILL.md` ships in-repo. Users install by
  copying to `~/.claude/skills/teamhero-interview/`.
- Frontmatter `name: teamhero-interview` + descriptive trigger
  description covering bootstrap / grade / cohort / hiring queries.
- Ethical framing reproduced as binding rules: observations not scores,
  bias diversification (not elimination), mandatory human-in-the-loop.
- All 3 MVP verbs documented with example invocations.
- v1.5 verb stubs (list-roles, list-candidates) acknowledged with the
  current workaround (ls docs/interviews/).
- Cohort orchestration workflow described: read role config → ask user
  for transcript per candidate → grade each → run cohort verb → remind
  manager that each audit needs its own sign-off.
- Explicit "what NOT to do" section: do not produce scores, do not
  bypass the privacy gate, do not feed session_recording_url to the
  observer, do not re-introduce score fields in code.
- Skill repeats explicitly: contains no business logic; all logic lives
  in src/services/interview/.

Tests (24 new):
- Cohort summary: 11 (loadCohort missing-dir; reads each candidate;
  skips folders without audit.json; banner present; required columns +
  no Score/Total/Rank; pending vs reviewed indicators; alphabetical
  default; chronological opt-in; empty-cohort message; counts in
  header; writeCohortSummary produces file).
- Skill: 9 (skill exists; frontmatter has name + description; all 3
  MVP verbs documented; v1.5 stubs mentioned; ethical framing —
  observations-not-scores + bias-diversification + human-in-the-loop;
  refusal-of-numerical-scores instruction; cohort orchestration
  documented; no-business-logic statement present; session_recording_url
  feeding warning present).
- Go: 4 (cohort flag parsing; validation rejects missing --role;
  validation rejects bad --order; runner delegation + exit code
  forwarding).

`just build-all` ✓, `just test-all` ✓ (1652 pass, 0 fail; Go all green).

Refs: teamhero-scripts-9v2

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two CI failures on PR #10's test-typescript job:

1. scripts/run-tests.sh line 64 had a bash arithmetic syntax error when
   bun emitted multi-line "N pass" / "N fail" / "N skip" output. The grep
   captured all matches (e.g., "3\n0"), and \$((PASS + file_pass)) errored
   with "syntax error in expression". Counts silently desynced (CI 1398
   pass vs local 1652). Fixed by piping to tail -1 so only the summary
   line's match is used.

2. Function coverage was 84.49% (under 85% threshold) in CI. Added
   targeted tests for the uncovered seams in the new interview module:
   - OpenAIObserverClient (constructor, observe() happy path, missing
     output_text guard, rejectIfScored defense-in-depth)
   - summarizeEvents non-prompt branches (tool-use, command, commit,
     transcript-line) via buildObserverPrompt
   - validateGenerated re-runs validation on an existing output dir
   - extractTestPass falls through to realRunner on dirs without
     go.mod / package.json

Local coverage: 88.39% → 88.79% functions (over the 85% threshold with
~4pp margin).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@asabaylus asabaylus self-assigned this May 11, 2026
@asabaylus asabaylus added the enhancement New feature or request label May 11, 2026
Implements the three v1.5 slices for `teamhero interview`:

- **Slice 1 (teamhero-scripts-92a)**: Interactive huh.Form wizard for
  `teamhero interview bootstrap`. Invoking with no flags on a TTY drops
  into a wizard that walks role identity, time-box, project mode,
  analysis mode, and rubric mode — with conditional follow-ups for
  the custom-prompt and default+jd branches. Form-state-building is
  isolated in pure helpers (newBootstrapWizardModel,
  bootstrapWizardNextState, bootstrapWizardOptionsFromModel) so the
  rubric-mode branching, defaults, and option round-trip are all
  unit-testable without rendering a TTY. The wizard hands its
  BootstrapOptions to the same ValidateBootstrapOptions gate the
  headless path uses, so there is no parallel validation logic.
  Headless mode (any flags supplied) is unchanged.

- **Slice 2 (teamhero-scripts-upo)**: Three TUI polish concerns:
  (1) shared interviewStyles module centralizing the palette so future
  screens stay visually aligned with the report wizard; (2) glamour-
  rendered audit preview (`renderInterviewAuditPreview`) that pins the
  ADVISORY warning banner above the source content — managers read the
  preview before the sign-off file; (3) phased progress display
  (`interviewProgressModel`) consuming the existing
  InterviewProgressEvent JSON-lines protocol from the bun subprocess,
  with five canonical phases (clone → collect-evidence →
  extract-measurements → observe → audit-write) and graceful failure
  surfacing.

- **Slice 3 (teamhero-scripts-rjp)**: Manual end-to-end smoke script
  at `scripts/manual-test-interview.sh` walking a human operator
  through the wizard, headless bootstrap, grade with --mode-analysis
  human-only (no OpenAI spend), sign-off gating, and cohort rendering.
  README documents the interactive wizard as the primary path; the
  Claude skill at `skills/teamhero-interview/SKILL.md` now recommends
  the wizard on a TTY and falls back to the headless flag list for
  agents and scripting.

Test coverage: ~35 new Go tests covering wizard state-machine
branching, field validators (URL-safe slug, time-box range, JD-path
existence), options round-trip, dispatcher wizard-vs-headless
decision, preview pinning, progress phase transitions, manual-test-
script structure. Overall Go coverage: 87.7% (above 85% threshold).
TS: 1660 pass, 0 fail. No behavior change to the headless path.

Closes teamhero-scripts-92a, teamhero-scripts-upo, teamhero-scripts-rjp.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@asabaylus asabaylus changed the title feat(interview): candidate AI-collaboration interview grader (all 5 slices) feat(interview): Candidate AI Code Interview tool (all 5 slices) May 11, 2026
Updates the candidate-interview-grader plan to reflect what has actually
landed on PR #10. Replaces the "deferred until go-ahead" Next Steps
section with an Implementation Progress table that tracks the 6 MVP
slices (beads `teamhero-scripts-{3sh,7qe,h13,bt2,9v2}`) and the 3 v1.5
slices (beads `teamhero-scripts-{92a,upo,rjp}`) — all closed. Adds a
note that the `tui/assess_*.go` reference files don't exist on `main`
yet, so v1.5 aligned with `tui/wizard.go` + `tui/setup.go` instead.

The "Still deferred" list keeps the pilot, `list-roles`/`list-candidates`,
multi-model bias diversification, and periodic anonymized bias audit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a complete interview-review feature: CLI/TUI verbs (bootstrap, review, cohort), headless Bun scripts, TypeScript services for bootstrap and review, evidence collectors/extractors, OpenAI observer with strict schema and bias guard, audit/cohort writers with ADVISORY banner and sign-off, kit scripts/docs, and extensive tests.

Changes

Interview Reviewer

Layer / File(s) Summary
CLI & TUI entrypoints
src/cli/index.ts, tui/interview.go, tui/*_bootstrap*.go, tui/interview_review.go, tui/interview_cohort.go, tui/main.go
Registers interview subcommand, forwards help/flags, adds headless verbs and TUI dispatching with preview/progress rendering.
Role config & rubric
src/services/interview/bootstrap/role-config.ts, src/services/interview/shared/rubric.ts
Adds RoleConfig types/validation, Project/Analysis/Rubric modes, and a fixed rubric catalog with evidence modes.
Bootstrap generation
src/services/interview/bootstrap/project-generator.ts, src/services/interview/bootstrap/openai-generator-client.ts, src/services/interview/bootstrap/orchestrator.ts, scripts/run-interview-bootstrap.ts
Generator client (OpenAI), safe file path handling, retries, validators, runBootstrap orchestrator, and headless bootstrap script.
Evidence collectors
src/services/interview/review/collectors/asciinema.ts, jsonl-log.ts, transcript.ts, git-history.ts
Parsers that normalize terminal, prompt/tool, transcript, and git history into EvidenceEvent streams.
Deterministic extractors
src/services/interview/review/extractors/*
extractVerification, extractTestPass, extractThroughput, extractRiskAwareness compute Measurements from evidence.
AI observer & orchestration
src/services/interview/review/ai-observer.ts, src/services/interview/review/review-orchestrator.ts, scripts/run-interview-review.ts
Builds observation-only prompts (bias guard), strict response schema, OpenAI client, and reviewCandidate orchestration: clone/read, collect, measure, observe or human-only, write audits.
Audit & cohort writers
src/services/interview/review/audit-writer.ts, src/services/interview/cohort/cohort-summary.ts, scripts/run-interview-cohort.ts
Render/write summary.md/audit.md/audit.json with ADVISORY banner and sign-off, produce COHORT.md roll-ups.
TUI wizard & preview
tui/interview_bootstrap_form.go, tui/interview_bootstrap_wizard.go, tui/interview_preview.go, tui/interview_progress.go, tui/interview_styles.go
Interactive huh bootstrap wizard, validators, glamour preview (pinned advisory), progress model, and styles.
Interview kit & scripts
teamhero-interview-kit/*, teamhero-interview-kit/lib/privacy-gate.sh, scripts/manual-test-interview.sh, teamhero-interview-kit/start.sh, end.sh
Kit assets, Claude hooks/settings, privacy release template, start/end scripts, privacy gate, and manual smoke-test script.
Docs & skill
docs/*, README.md, skills/teamhero-interview/SKILL.md
Adds classification rationale, rubric, reviewer plan, README quick start, and Claude skill docs enforcing observations-not-scores and human sign-off.
Tests
tests/unit/**, tui/*_test.go
Extensive unit and edge tests for generators, validators, collectors, extractors, observer, audit writer, cohort, TUI, and kit scripts.
Utilities / CI
scripts/run-tests.sh, .coderabbit.yaml
Improved test summary parsing and CodeRabbit config for automated reviews.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as teamhero CLI
  participant TUI as Go TUI
  participant Bun as Bun scripts
  participant Svc as TS Services
  participant OpenAI as OpenAI Responses API
  participant FS as Filesystem

  CLI->>TUI: run `teamhero interview review ...`
  TUI->>Bun: invoke `scripts/run-interview-review.ts` (flags)
  Bun->>Svc: reviewCandidate(input, deps)
  Svc->>FS: clone/read repo, read evidence files
  Svc->>Svc: compute measurements (extractors)
  Svc->>OpenAI: observe(prompt with strict json_schema)
  OpenAI-->>Svc: structured observations
  Svc->>FS: write summary.md, audit.md, audit.json, evidence/
  Bun-->>TUI: return artifact paths
  TUI-->>CLI: render preview + ADVISORY banner
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

"A rabbit scrawls a checklist, bright and fleet,
Through prompts and logs and tests it nibbles neat.
ADVISORY pinned, observations not score,
Papers and audits across the floor.
🐇✨"

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/slice-1-foundation

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (18)
teamhero-interview-kit/lib/privacy-gate.sh-16-16 (1)

16-16: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move set -u inside the conditional execution block to avoid mutating the caller's environment.

The global set -u at line 16 is applied whenever this file is sourced as a library, which unexpectedly enables the nounset option in any caller that doesn't already have it enabled. This violates the library pattern and can break scripts that rely on unset variable behavior.

Move the set -u declaration inside the conditional check at lines 72–75 so it only applies during direct script execution:

Suggested fix
-set -u
-
# extract_section <file> <heading> — prints the body that follows a given
 # When invoked directly (not sourced), run the check on $1 and exit with the
 # function's return code.
 if [ "${BASH_SOURCE[0]:-$0}" = "$0" ]; then
+    set -u
     check_privacy_release_signed "${1:-}"
     exit $?
 fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@teamhero-interview-kit/lib/privacy-gate.sh` at line 16, Remove the top-level
"set -u" so the library doesn't change the caller's shell options, and instead
enable nounset only inside the script's main execution guard (the conditional
that checks whether the file is run directly vs sourced); place "set -u" at the
start of that guard block (and leave it out of any exported functions or
top-level code) so nounset applies only during direct execution and not when the
file is sourced.
src/services/interview/bootstrap/project-validator.ts-21-29 (1)

21-29: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent recursive walk from following symlink cycles

walk uses statSync, which follows symlinks. A cyclic symlink inside generated project contents can cause unbounded recursion.

Proposed fix
-import { existsSync, readFileSync, readdirSync, statSync } from "node:fs";
+import { existsSync, lstatSync, readFileSync, readdirSync } from "node:fs";
@@
 	for (const entry of readdirSync(dir)) {
 		if (entry === "node_modules" || entry === ".git") continue;
 		const full = join(dir, entry);
-		const s = statSync(full);
+		const s = lstatSync(full);
+		if (s.isSymbolicLink()) continue;
 		if (s.isDirectory()) walk(full, out);
 		else out.push(full);
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/bootstrap/project-validator.ts` around lines 21 - 29,
The walk function can follow symlinks because it uses statSync, causing
potential cycles; change walk(dir: string, out: string[] = []) to skip symbolic
links (use lstatSync and if entry is a symlink, skip or optionally resolve
safely) or add a visited set of realpaths to prevent revisiting; specifically
update the logic around readdirSync + statSync to use lstatSync and either
ignore entries where lstatSync(...).isSymbolicLink() or track realpath
(fs.realpathSync) in a visited Set inside walk (or an additional parameter) and
return early when a path is already visited to prevent unbounded recursion.
src/services/interview/shared/events.ts-30-41 (1)

30-41: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add validation for step and status fields in parseInterviewEvent

The function accepts {"type":"progress"} without validating required fields step and status, allowing malformed events to pass type checking at runtime. The message and progress optional fields are also unvalidated if present.

Proposed fix
 export function parseInterviewEvent(line: string): InterviewEvent | null {
 	let value: unknown;
 	try {
 		value = JSON.parse(line);
 	} catch {
 		return null;
 	}
 	if (!value || typeof value !== "object") return null;
-	const type = (value as { type?: unknown }).type;
+	const obj = value as Record<string, unknown>;
+	const type = obj.type;
 	if (typeof type !== "string") return null;
 	if (!KNOWN_EVENT_TYPES.includes(type as InterviewEvent["type"])) return null;
-	return value as InterviewEvent;
+	if (type === "progress") {
+		const step = obj.step;
+		const status = obj.status;
+		if (typeof step !== "string") return null;
+		if (
+			status !== "start" &&
+			status !== "update" &&
+			status !== "done" &&
+			status !== "error"
+		) {
+			return null;
+		}
+		if (obj.message !== undefined && typeof obj.message !== "string") return null;
+		if (obj.progress !== undefined && typeof obj.progress !== "number") return null;
+		return obj as InterviewProgressEvent;
+	}
+	return null;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/shared/events.ts` around lines 30 - 41,
parseInterviewEvent currently accepts {"type":"progress"} without validating
required fields; update parseInterviewEvent to, after confirming type is
"progress", check that (value as any).step exists and is a string and (value as
any).status exists and is a string and matches the allowed progress statuses
(use your existing InterviewEvent status union or a small array of allowed
status strings), and also validate optional fields: if message is present it
must be a string, and if progress is present it must be a number; reject (return
null) for any type-mismatches so that only well-formed InterviewEvent objects
(for all types, but especially "progress") are returned while still using
KNOWN_EVENT_TYPES and casting to InterviewEvent when validation passes.
src/services/interview/assess/extractors/verification.ts-58-71 (1)

58-71: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Interleaving metric ignores the advertised 30-second window.

The code only checks prompt→test adjacency, but never validates the 30s condition described in the comment and fact label context.

Proposed fix
 	let interleavedAfterPrompt = 0;
 	for (let i = 1; i < merged.length; i++) {
 		const cur = merged[i];
 		const prev = merged[i - 1];
 		if (cur.type === "command" && prev.type === "prompt") {
-			interleavedAfterPrompt += 1;
+			const deltaMs =
+				new Date(cur.timestamp).getTime() - new Date(prev.timestamp).getTime();
+			if (Number.isFinite(deltaMs) && deltaMs >= 0 && deltaMs <= 30_000) {
+				interleavedAfterPrompt += 1;
+			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/extractors/verification.ts` around lines 58 -
71, The interleaving count currently only checks adjacency between merged
entries but ignores the intended 30-second window; update the loop that computes
interleavedAfterPrompt (which iterates over merged built from prompts and
testRuns) to parse timestamps to numeric time and only increment
interleavedAfterPrompt when cur.type === "command" && prev.type === "prompt" &&
(new Date(cur.timestamp).getTime() - new Date(prev.timestamp).getTime()) <=
30000; ensure merged remains sorted by timestamp before this check.
src/services/interview/bootstrap/role-config.ts-121-126 (1)

121-126: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate and harden readRoleConfig instead of unchecked cast.

JSON.parse(... as RoleConfig) allows malformed shape through (or throws unwrapped parse errors), which makes downstream failures harder to diagnose.

Proposed fix
 export function readRoleConfig(dir: string): RoleConfig | null {
 	const path = join(dir, ROLE_CONFIG_FILENAME);
 	if (!existsSync(path)) return null;
-	const body = readFileSync(path, "utf8");
-	return JSON.parse(body) as RoleConfig;
+	const body = readFileSync(path, "utf8");
+	let parsed: unknown;
+	try {
+		parsed = JSON.parse(body);
+	} catch (error) {
+		throw new Error(`Invalid ${ROLE_CONFIG_FILENAME}: malformed JSON`);
+	}
+	const result = validateRoleConfig(parsed as RoleConfig);
+	if (!result.ok) {
+		throw new Error(
+			`Invalid ${ROLE_CONFIG_FILENAME}: ${result.failures.join("; ")}`,
+		);
+	}
+	return parsed as RoleConfig;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/bootstrap/role-config.ts` around lines 121 - 126,
readRoleConfig currently does an unchecked JSON.parse and casts to RoleConfig;
wrap the read+parse in a try/catch to surface parse errors and then validate the
parsed object’s shape (check required properties and their types expected by
RoleConfig) before returning it — if validation fails return null or throw a
descriptive Error with context including ROLE_CONFIG_FILENAME and the path, so
callers don’t receive a malformed RoleConfig instance; implement this inside
readRoleConfig.
src/services/interview/assess/collectors/transcript.ts-24-37 (1)

24-37: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Transcript parser is too brittle for common speaker labels and invalid anchor timestamps.

The bare-speaker regex can skip valid lines (e.g., Speaker 1: ...), and an invalid sessionStartIso can throw on toISOString().

Proposed fix
-const BARE = /^([A-Za-z][A-Za-z .'-]+?):\s+(.+)$/;
+const BARE = /^([^:\[\]#][^:]{0,79}?):\s+(.+)$/;

 function toIsoFromHMS(hms: string, sessionStartIso: string): string {
 	const parts = hms.split(":").map(Number);
@@
-	const base = new Date(sessionStartIso).getTime();
-	return new Date(base + totalSeconds * 1000).toISOString();
+	const parsedBase = new Date(sessionStartIso).getTime();
+	const base = Number.isFinite(parsedBase)
+		? parsedBase
+		: new Date("1970-01-01T00:00:00.000Z").getTime();
+	return new Date(base + totalSeconds * 1000).toISOString();
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/collectors/transcript.ts` around lines 24 - 37,
The bare-speaker regex (BARE) is too strict (misses labels like "Speaker 1") and
toIsoFromHMS can throw if sessionStartIso or parsed HMS are invalid; update BARE
to allow digits and optional numeric suffixes (e.g., change /^([A-Za-z][A-Za-z
.'-]+?):\s+(.+)$/ to accept names with digits and leading words such as "Speaker
1"), and harden toIsoFromHMS: validate sessionStartIso by parsing new
Date(sessionStartIso) and checking for NaN, validate each HMS part with
Number.isFinite before computing totalSeconds, and if either the base date or
HMS parts are invalid return a safe fallback (e.g., the original sessionStartIso
or new Date().toISOString()) instead of calling toISOString on an invalid date;
reference TIMESTAMPED and BARE for regex fixes and toIsoFromHMS for the
parsing/validation changes.
src/services/interview/assess/extractors/test-pass.ts-20-42 (1)

20-42: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add timeout and error handling to test runner processes to prevent grader hangs.

Both spawnSync() calls lack timeout configuration and don't capture r.error, which can cause the grader to block indefinitely if a test process hangs or fails to spawn.

Proposed fix
+const TEST_RUN_TIMEOUT_MS = 120_000;
+
 const realRunner: TestRunner = (repoDir) => {
 	if (existsSync(join(repoDir, "go.mod"))) {
 		const r = spawnSync("go", ["test", "./..."], {
 			cwd: repoDir,
 			encoding: "utf8",
+			timeout: TEST_RUN_TIMEOUT_MS,
+			killSignal: "SIGKILL",
 		});
 		return {
 			passed: countMatches(r.stdout ?? "", /^ok\s+/gm),
 			failed: countMatches(r.stdout ?? "", /^FAIL\s+/gm),
-			output: `${r.stdout ?? ""}\n${r.stderr ?? ""}`,
+			output: `${r.stdout ?? ""}\n${r.stderr ?? ""}${
+				r.error ? `\n${r.error.message}` : ""
+			}`,
 		};
 	}
 	if (existsSync(join(repoDir, "package.json"))) {
 		const r = spawnSync("bun", ["test"], {
 			cwd: repoDir,
 			encoding: "utf8",
+			timeout: TEST_RUN_TIMEOUT_MS,
+			killSignal: "SIGKILL",
 		});
-		const combined = `${r.stdout ?? ""}\n${r.stderr ?? ""}`;
+		const combined = `${r.stdout ?? ""}\n${r.stderr ?? ""}${
+			r.error ? `\n${r.error.message}` : ""
+		}`;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/extractors/test-pass.ts` around lines 20 - 42,
The spawnSync calls in src/services/interview/assess/extractors/test-pass.ts
(the blocks that run "go test ./..." and "bun test") need timeout and error
handling to avoid hangs: add a timeout option (e.g. 30000 ms) and a larger
maxBuffer to both spawnSync invocations, check r.error and r.status (or
r.signal) and include r.error/message/signal in the returned output string, and
treat a timeout/spawn failure as test failures (set failed>0 or a specific flag)
by returning a non-zero failed count when r.error or r.signal is present; update
the returned objects for the functions that use the "r" result so both
passed/failed and output reflect these error/timeout conditions.
src/services/interview/assess/collectors/git-history.ts-14-18 (1)

14-18: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace execSync with execFileSync to avoid shell string construction.

Building a shell command string for execSync is unnecessary and fragile. Pass arguments directly to the git executable via execFileSync, which avoids shell interpretation entirely and is the standard Node.js best practice for spawning processes.

Proposed fix
-import { execSync } from "node:child_process";
+import { execFileSync } from "node:child_process";
@@
 const realRunner: GitRunner = (args, cwd) =>
-	execSync(`git ${args.map((a) => `'${a.replace(/'/g, "'\\''")}'`).join(" ")}`, {
+	execFileSync("git", args, {
 		cwd,
 		encoding: "utf8",
 	});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/collectors/git-history.ts` around lines 14 -
18, The current realRunner builds a shell command string and calls execSync,
which is fragile; change realRunner to call execFileSync directly with the git
executable and the args array (e.g. execFileSync('git', args, { cwd, encoding:
'utf8' })), remove the manual quoting/escaping logic, and ensure execFileSync is
imported from child_process; keep the GitRunner signature (args, cwd) and
preserve options like cwd and encoding.
scripts/run-interview-grade.ts-7-7 (1)

7-7: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove override: true to prevent CI/shell-provided environment variables from being overwritten.

The override: true option causes dotenv to replace any existing process.env values with those from the .env file. This will clobber runtime-provided secrets and configuration (e.g., from CI systems), breaking expected behavior in non-local environments. Use the default behavior instead.

Proposed fix
-loadDotenv({ override: true });
+loadDotenv();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/run-interview-grade.ts` at line 7, Call to loadDotenv({ override:
true }) in scripts/run-interview-grade.ts is forcing .env values to overwrite
existing environment variables; remove the override option so dotenv uses
default behavior and does not clobber CI/shell-provided vars. Locate the
loadDotenv invocation in that file (the loadDotenv symbol) and change it to
loadDotenv() or remove the override property so existing process.env entries
remain intact.
src/services/interview/assess/collectors/jsonl-log.ts-28-30 (1)

28-30: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle missing interview.log as empty evidence instead of throwing.

A missing log file currently throws before parsing starts, which can fail an entire grade run for absent optional evidence.

💡 Proposed fix
 export function parseInterviewLog(path: string): InterviewLogParseResult {
-	const body = readFileSync(path, "utf8");
+	let body = "";
+	try {
+		body = readFileSync(path, "utf8");
+	} catch (err) {
+		if ((err as NodeJS.ErrnoException).code === "ENOENT") {
+			return { prompts: [], toolUses: [] };
+		}
+		throw err;
+	}
 	const lines = body.split("\n").filter((l) => l.trim().length > 0);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/collectors/jsonl-log.ts` around lines 28 - 30,
The parseInterviewLog function currently calls readFileSync and throws if the
file is missing; update parseInterviewLog to treat a missing interview.log as
empty evidence instead of failing: check for file existence (e.g., using
fs.existsSync) or wrap readFileSync in a try/catch, and if the file is absent or
unreadable return an empty InterviewLogParseResult (empty lines/entries) rather
than letting the exception propagate; keep the rest of parsing logic intact so
parseInterviewLog and its callers receive an empty result when interview.log is
not present.
scripts/run-interview-bootstrap.ts-81-93 (1)

81-93: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate timeBox and mode flag values before constructing RoleConfig.

The current parsing allows invalid values (e.g., --time-box abc, unknown --mode-*) to pass as typed config and fail later in less clear ways.

💡 Proposed fix
 function buildConfig(flags: ParsedFlags): RoleConfig | string {
@@
-	const timeBox = Number.parseInt(flags.timeBox ?? "90", 10);
+	const timeBox = Number.parseInt(flags.timeBox ?? "90", 10);
+	if (!Number.isFinite(timeBox) || timeBox < 30 || timeBox > 240) {
+		return "Invalid --time-box; expected an integer between 30 and 240";
+	}
+
+	const projectModes = new Set<ProjectMode>(["A", "B"]);
+	const analysisModes = new Set<AnalysisMode>(["ai-assisted", "human-only"]);
+	const rubricModes = new Set<RubricMode>(["default", "custom", "default+jd"]);
+
+	if (!projectModes.has(flags.modeProject as ProjectMode)) {
+		return "Invalid --mode-project; expected A or B";
+	}
+	if (!analysisModes.has(flags.modeAnalysis as AnalysisMode)) {
+		return "Invalid --mode-analysis; expected ai-assisted or human-only";
+	}
+	if (!rubricModes.has(flags.modeRubric as RubricMode)) {
+		return "Invalid --mode-rubric; expected default, custom, or default+jd";
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/run-interview-bootstrap.ts` around lines 81 - 93, Before constructing
RoleConfig, validate flags.timeBox and each mode flag: ensure timeBox
(flags.timeBox) parses to a positive integer and reject non-numeric or <=0
values (set default only after validation), and verify flags.modeProject,
flags.modeAnalysis, and flags.modeRubric are valid members of the corresponding
enums/types (ProjectMode, AnalysisMode, RubricMode); if any validation fails,
log a clear error and exit/throw rather than letting invalid values flow into
the RoleConfig. Perform these checks just above where timeBox is parsed and the
config object is created (referencing the timeBox variable and the RoleConfig
construction), and normalize/cast only after validation so RoleConfig gets only
safe, well-typed values.
teamhero-interview-kit/start.sh-57-59 (1)

57-59: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Write .interview-state/started-at only after recorder preflight passes.

Right now, a missing asciinema still leaves a start marker behind. That creates a false “session started” state even though recording never began.

Suggested reorder
-# Indicate we passed the gate so end.sh can verify the kit was started.
-mkdir -p "$REPO_ROOT/.interview-state"
-date -u +%FT%TZ > "$REPO_ROOT/.interview-state/started-at"
-
 if [ "${SKIP_RECORD:-0}" = "1" ]; then
+    mkdir -p "$REPO_ROOT/.interview-state"
+    date -u +%FT%TZ > "$REPO_ROOT/.interview-state/started-at"
     echo "✓ Privacy gate passed. SKIP_RECORD=1 — not starting asciinema."
     exit 0
 fi
@@
 if ! command -v "$ASCIINEMA_BIN" >/dev/null 2>&1; then
@@
     exit 1
 fi
 
+# Indicate we passed all prerequisites and are starting recording.
+mkdir -p "$REPO_ROOT/.interview-state"
+date -u +%FT%TZ > "$REPO_ROOT/.interview-state/started-at"
+
 CAST_PATH="$REPO_ROOT/terminal.cast"

Also applies to: 66-79

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@teamhero-interview-kit/start.sh` around lines 57 - 59, The start marker file
".interview-state/started-at" is written before the recorder preflight
completes, causing false "session started" states if asciinema or recorder setup
fails; move the mkdir -p "$REPO_ROOT/.interview-state" and the date -u ... >
"$REPO_ROOT/.interview-state/started-at" lines out of the current top-level
start flow and into the success branch that runs after the recorder
preflight/asciinema checks complete (i.e., only execute these lines when the
script confirms recorder is present and recording has been started), and apply
the same reorder for the duplicate block around lines 66-79 so the marker is
only created on successful recorder initialization.
src/services/interview/assess/collectors/asciinema.ts-68-85 (1)

68-85: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Parse input chunks character-by-character to avoid missed command submits.

evt.data may contain multi-character chunks (e.g., pasted text or cmd\r in one event). The current branch logic treats the whole chunk as one token, which can fail to emit commands correctly.

Suggested fix
-		const data = evt.data;
-		if (data === "\r" || data === "\n") {
-			if (buffer.trim().length > 0) {
-				commands.push({
-					type: "command",
-					timestamp: isoFromUnix(baseEpoch + evt.delta),
-					source: "terminal.cast",
-					command: buffer,
-					pauseSecondsBeforeEnter: Math.max(0, evt.delta - lastKeyDelta),
-				});
-			}
-			buffer = "";
-		} else if (data === "�" || data === "\b") {
-			// Backspace
-			buffer = buffer.slice(0, -1);
-			lastKeyDelta = evt.delta;
-		} else if (data.charCodeAt(0) >= 32 || data === "\t") {
-			buffer += data;
-			lastKeyDelta = evt.delta;
-		} else {
-			// Control codes: arrows, escape sequences, etc. Skip silently.
-		}
+		for (const ch of evt.data) {
+			if (ch === "\r" || ch === "\n") {
+				if (buffer.trim().length > 0) {
+					commands.push({
+						type: "command",
+						timestamp: isoFromUnix(baseEpoch + evt.delta),
+						source: "terminal.cast",
+						command: buffer,
+						pauseSecondsBeforeEnter: Math.max(0, evt.delta - lastKeyDelta),
+					});
+				}
+				buffer = "";
+				continue;
+			}
+			if (ch === "\u007f" || ch === "\b") {
+				buffer = buffer.slice(0, -1);
+				lastKeyDelta = evt.delta;
+				continue;
+			}
+			if (ch === "\t" || ch.charCodeAt(0) >= 32) {
+				buffer += ch;
+				lastKeyDelta = evt.delta;
+			}
+		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/collectors/asciinema.ts` around lines 68 - 85,
The event handler currently treats evt.data as a single token, which misses
command boundaries when evt.data contains multiple characters (pastes or
"cmd\r"); update the processing block that uses data, buffer, commands,
isoFromUnix, lastKeyDelta and evt.delta to iterate over each character in
evt.data (e.g., for (const ch of data) { ... }) and run the existing branches
per-character (newline -> push command with timestamp from isoFromUnix(baseEpoch
+ evt.delta), backspace -> slice buffer, printable/tab -> append to buffer and
set lastKeyDelta = evt.delta), ensuring buffer resets after a submitted command
so multi-char chunks are parsed correctly.
src/services/interview/bootstrap/orchestrator.ts-36-49 (1)

36-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap generation/persist steps to preserve the RunBootstrapResult contract.

generateProject(...) and writeRoleConfig(...) can throw; right now that escapes as an unhandled rejection instead of returning a structured failure. This makes bootstrap failure handling brittle at the CLI boundary.

Proposed fix
-	const generation = await generateProject(config, options.client, {
-		kitTemplateDir: options.kitTemplateDir,
-		maxAttempts: options.maxAttempts,
-	});
+	let generation;
+	try {
+		generation = await generateProject(config, options.client, {
+			kitTemplateDir: options.kitTemplateDir,
+			maxAttempts: options.maxAttempts,
+		});
+	} catch (error) {
+		return {
+			ok: false,
+			attempts: 0,
+			failures: [
+				error instanceof Error ? error.message : String(error),
+			],
+		};
+	}
 	if (!generation.ok) {
 		return {
 			ok: false,
 			attempts: generation.attempts,
 			failures: generation.failures,
 		};
 	}
-
-	writeRoleConfig(config.outputDir, config);
+	try {
+		writeRoleConfig(config.outputDir, config);
+	} catch (error) {
+		return {
+			ok: false,
+			attempts: generation.attempts,
+			failures: [
+				error instanceof Error ? error.message : String(error),
+			],
+		};
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/bootstrap/orchestrator.ts` around lines 36 - 49, The
code calls generateProject(...) and writeRoleConfig(...) which can throw, so
wrap the generation and persistence steps in a try/catch around the calls to
generateProject and writeRoleConfig inside the function that returns
RunBootstrapResult; on any thrown error return a RunBootstrapResult-shaped
failure (ok: false) including generation.attempts/failures if available or
default values and include the caught error message/context in the failures
array or an error field; ensure the function still returns the existing
successful shape when generation.ok is true and only writes role config inside
the try block to avoid unhandled rejections (refer to generateProject,
writeRoleConfig, and the RunBootstrapResult contract).
tui/interview_bootstrap_wizard.go-249-251 (1)

249-251: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Gate wizard launch on interactive TTY before entering interactive flow.

The dispatcher comment says wizard mode is for TTY callers, but the current condition only checks len(args) == 0. In non-interactive contexts this can trigger unexpected wizard execution instead of deterministic headless behavior.

Proposed fix
 	// No flags at all → interactive wizard.
 	if len(args) == 0 {
+		if fi, err := os.Stdin.Stat(); err != nil || (fi.Mode()&os.ModeCharDevice) == 0 {
+			fmt.Fprintln(stderr, "no interactive TTY detected; pass headless flags")
+			return 1
+		}
 		res, err := launcher.Launch()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tui/interview_bootstrap_wizard.go` around lines 249 - 251, The wizard launch
is currently gated only by len(args) == 0 which can run the interactive
launcher.Launch() in non-interactive contexts; change the condition to require
an interactive TTY as well (e.g., check that stdin is a terminal via
os.Stdin/terminal/term IsTerminal equivalents) so that the code only calls
launcher.Launch() when len(args) == 0 AND stdin is a TTY; otherwise proceed with
the deterministic headless path. Ensure you update the condition around
launcher.Launch() and keep the existing dispatcher comment consistent.
src/services/interview/cohort/cohort-summary.ts-39-43 (1)

39-43: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate audit.json frontmatter shape before adding records

On Line 39, JSON.parse is cast directly to { frontmatter: AuditFrontmatter }. A parseable but malformed file (e.g., missing candidate or signed_off) will be pushed and can fail later in renderRow (Line 53+).

Suggested fix
-			const parsed = JSON.parse(body) as { frontmatter: AuditFrontmatter };
-			records.push({
-				frontmatter: parsed.frontmatter,
-				summaryPath: join(entry, "summary.md"),
-			});
+			const parsed = JSON.parse(body) as {
+				frontmatter?: Partial<AuditFrontmatter>;
+			};
+			const fm = parsed.frontmatter;
+			if (
+				!fm ||
+				typeof fm.candidate !== "string" ||
+				typeof fm.date !== "string" ||
+				typeof fm.signed_off !== "boolean"
+			) {
+				continue;
+			}
+			records.push({
+				frontmatter: fm as AuditFrontmatter,
+				summaryPath: join(entry, "summary.md"),
+			});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/cohort/cohort-summary.ts` around lines 39 - 43, The
code currently casts JSON.parse(body) to { frontmatter: AuditFrontmatter } and
unconditionally pushes it into records; instead validate the parsed object has a
frontmatter and required AuditFrontmatter fields (e.g., candidate, signed_off,
etc.) before pushing to records to avoid later failures in renderRow; locate the
parsing block around parsed = JSON.parse(body) and replace the blind cast with
runtime checks that parsed?.frontmatter exists and that the required keys are
present (or skip/log/throw for malformed audit.json), then push only when valid
and keep summaryPath as before.
src/services/interview/assess/audit-writer.ts-53-73 (1)

53-73: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Quote/escape YAML scalar frontmatter values

On Lines 55–73, frontmatter fields are emitted as raw strings. Values containing :, #, newlines, or quotes can corrupt parsing or inject unintended keys.

Suggested fix
 function yaml(value: AuditFrontmatter): string {
+	const q = (s: string): string => JSON.stringify(s);
 	const lines: string[] = ["---"];
-	lines.push(`tags: [${value.tags.join(", ")}]`);
-	lines.push(`candidate: ${value.candidate}`);
-	lines.push(`role: ${value.role}`);
-	lines.push(`date: ${value.date}`);
-	lines.push(`rubric_version: ${value.rubric_version}`);
-	lines.push(`rubric_mode: ${value.rubric_mode}`);
+	lines.push(`tags: [${value.tags.map(q).join(", ")}]`);
+	lines.push(`candidate: ${q(value.candidate)}`);
+	lines.push(`role: ${q(value.role)}`);
+	lines.push(`date: ${q(value.date)}`);
+	lines.push(`rubric_version: ${q(value.rubric_version)}`);
+	lines.push(`rubric_mode: ${q(value.rubric_mode)}`);
 	lines.push(`signed_off: ${value.signed_off}`);
 	if (value.recommendation !== undefined) {
-		lines.push(`recommendation: ${value.recommendation}`);
+		lines.push(`recommendation: ${q(value.recommendation)}`);
 	}
 	if (value.session_recording_url !== undefined) {
-		lines.push(`session_recording_url: ${value.session_recording_url}`);
+		lines.push(`session_recording_url: ${q(value.session_recording_url)}`);
 	}
 	if (value.session_platform !== undefined) {
-		lines.push(`session_platform: ${value.session_platform}`);
+		lines.push(`session_platform: ${q(value.session_platform)}`);
 	}
 	if (value.session_date !== undefined) {
-		lines.push(`session_date: ${value.session_date}`);
+		lines.push(`session_date: ${q(value.session_date)}`);
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/audit-writer.ts` around lines 53 - 73, The
yaml(value: AuditFrontmatter) function emits frontmatter scalars unescaped which
can break YAML when values contain characters like :, #, quotes, or newlines;
update it to sanitize/quote every scalar field (tags can remain array syntax but
each tag must be quoted) by adding a small helper (e.g. escapeYamlScalar) and
use it for candidate, role, date, rubric_version, rubric_mode, signed_off,
recommendation, session_recording_url, session_platform, and session_date; the
helper should wrap strings in double quotes and escape internal double quotes
and newlines (or alternatively call a stable YAML serializer) so the generated
frontmatter is always safe to parse.
src/services/interview/assess/grade-orchestrator.ts-194-209 (1)

194-209: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

candidate_id can collide and overwrite same-day runs

On Line 194, candidate_id is date-granular (<slug>-YYYY-MM-DD). Re-grading the same candidate on the same day will target the same default output directory and overwrite prior artifacts.

Suggested fix
+function nowIdSuffix(): string {
+	return new Date().toISOString().replace(/[:.]/g, "-");
+}
...
-			candidate_id: `${slug(input.candidateName)}-${todayIso()}`,
+			candidate_id: `${slug(input.candidateName)}-${nowIdSuffix()}`,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/grade-orchestrator.ts` around lines 194 - 209,
The current candidate_id uses date granularity
(slug(input.candidateName)-todayIso()) which causes same-day runs to collide;
update the candidate_id generation in the result object to append a
higher-resolution unique suffix (e.g., timestamp or a UUID/nanoid) so it becomes
slug(input.candidateName)-todayIso()-<unique>, and keep the outputDir fallback
logic (the outputDir variable that uses input.outputDir ?? join(...,
result.candidate_id)) unchanged so that default directories are distinct per
run; locate the candidate_id assignment and modify it to include the unique
suffix (or call a helper like Date.now() or uuid()) so re-grades on the same day
do not overwrite prior artifacts.
🟡 Minor comments (15)
src/cli/index.ts-192-206 (1)

192-206: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Block interview in the report reserved-verb guard too.

After adding the top-level interview verb here, the misuse guard in report (Line 157) still only blocks doctor/setup. teamhero report interview now slips through as report args instead of producing the helpful redirect error.

Suggested fix
-			const subcommands = ["doctor", "setup"];
+			const subcommands = ["doctor", "setup", "interview"];
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cli/index.ts` around lines 192 - 206, The new top-level command declared
via program.command("interview") is not included in the reserved-verb check
inside the report misuse guard, so "teamhero report interview" bypasses the
redirect; update the report command's misuse guard to include "interview"
alongside "doctor" and "setup" (i.e., add "interview" to the reserved verbs
list/check used by the report handler) so calls matching "interview" are
redirected the same way; locate the guard logic in the report command and adjust
its reservation check to include the "interview" verb (the new interview command
is created with program.command("interview") and launches spawnTui).
tests/unit/services/interview/bootstrap/project-validator.spec.ts-59-61 (1)

59-61: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove test-file line adjustment from fixture LOC calculation

The fixture includes 5 test-file lines in the currentLoc estimate, but the validator only counts source files (via the !TEST_NAME_PATTERN filter). This causes the padding calculation to undershoot by 5 lines.

-	const currentLoc = o.deepModuleCount * 100 + (o.withFailingTests ? 5 : 0);
+	const currentLoc = o.deepModuleCount * 100;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/services/interview/bootstrap/project-validator.spec.ts` around
lines 59 - 61, The test fixture currently adds 5 lines for test files to the LOC
estimate (in the currentLoc calculation using o.deepModuleCount * 100 +
(o.withFailingTests ? 5 : 0)), but the validator filters out test files via
TEST_NAME_PATTERN and only counts source files, causing an undershoot; remove
the test-file adjustment so currentLoc is computed from o.deepModuleCount only
(e.g., drop the +(o.withFailingTests ? 5 : 0)) so padding/remaining calculations
align with the validator's source-file counting.
tui/interview_progress.go-87-92 (1)

87-92: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Set phase to running on update events to ensure correct visual state

When an update event arrives before a start event, the phase incorrectly displays as pending (○ glyph) despite progress being reported. Update the "update" case to transition from pending to running.

Proposed fix
 	case "update":
+		if m.status[evt.Step] == interviewPhasePending {
+			m.status[evt.Step] = interviewPhaseRunning
+		}
 		// keep running, just record message for tooltip-style display
 		if evt.Message != "" {
 			m.messages[evt.Step] = evt.Message
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tui/interview_progress.go` around lines 87 - 92, The "update" event handler
currently only records tooltip messages (m.messages[evt.Step]) but doesn't
change the visual phase; modify the "update" case so that when an update arrives
it sets m.phase to PhaseRunning (or at least transitions from PhasePending to
PhaseRunning) so progress is shown as running; keep the existing message
recording logic for evt.Step and evt.Message and ensure you reference the
"update" case, m.phase, and the PhasePending/PhaseRunning constants when making
the change.
src/services/interview/assess/extractors/throughput.ts-13-13 (1)

13-13: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Test-run detection is narrower than the rest of the assessment pipeline.

TEST_RUN currently misses common runners (e.g., jest, vitest, cargo test, just test), so “Time to first test run” can be silently omitted for valid sessions.

Proposed fix
-const TEST_RUN = /^\s*(bun|npm|yarn|pnpm)\s+(run\s+)?test\b|^\s*go\s+test\b|^\s*pytest\b/;
+const TEST_RUN =
+	/^\s*((bun|npm|yarn|pnpm)\s+(run\s+)?test|go\s+test|pytest|jest|vitest|cargo\s+test|just\s+test)\b/;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/assess/extractors/throughput.ts` at line 13,
TEST_RUN's regex is too narrow and misses many common test runners; update the
TEST_RUN constant to include additional runners such as jest, vitest, mocha,
ava, tap, cypress, playwright, cargo (cargo test), and just (just test) while
preserving existing matches for package-manager-prefixed invocations
(bun|npm|yarn|pnpm with optional "run"). Modify the regex used in
src/services/interview/assess/extractors/throughput.ts (constant TEST_RUN) to
match both direct runner commands and package-manager-prefixed forms, and
consider making it case-insensitive or allowing optional whitespace so commands
like "npm run test", "yarn jest", "cargo test", and "just test" are all
detected.
teamhero-interview-kit/INTERVIEW_RULES.md-41-43 (1)

41-43: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks.

These fences trigger markdownlint MD040; adding bash keeps docs lint-clean and consistent.

♻️ Proposed fix
-   ```
+   ```bash
    git push origin HEAD
    ```
...
-```
+```bash
 brew install asciinema

...
- +bash
sudo apt install asciinema # or your distro's package manager

...
-   ```
+   ```bash
   sudo apt update
   sudo apt install asciinema git
   ```

Also applies to: 50-52, 56-58, 68-71

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@teamhero-interview-kit/INTERVIEW_RULES.md` around lines 41 - 43, Update the
Markdown fenced code blocks in INTERVIEW_RULES.md to include the bash language
identifier: change ``` to ```bash for the snippets containing "git push origin
HEAD", "brew install asciinema", "sudo apt install asciinema", and the block
with "sudo apt update / sudo apt install asciinema git" (also apply to the other
occurrences you noted at lines 50-52, 56-58, 68-71) so the file passes
markdownlint MD040 and renders correctly.
skills/teamhero-interview/SKILL.md-50-52 (1)

50-52: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Specify language for CLI code fences.

Adding bash to these fenced blocks resolves markdownlint MD040 warnings.

♻️ Proposed fix
-```
+```bash
 teamhero interview bootstrap

...
- +bash
teamhero interview bootstrap --headless
--role --stack --domain --feature ""
--time-box
--mode-project A|B
--mode-analysis ai-assisted|human-only
--mode-rubric default|custom|default+jd
--output-dir
[--jd-path ] [--custom-prompt ""] [--role-title "<title>"]

...
-```
+```bash
teamhero interview grade --candidate "Jane Doe" --repo <url> \
  [--transcript <file>] [--interviewer-notes <file>] \
  [--session-recording-url <url>] [--session-platform zoom|teams|meet|other] \
  [--session-date YYYY-MM-DD] [--output-dir <path>]

...
- +bash
teamhero interview cohort --role [--order alphabetical|chronological]

Also applies to: 63-72, 81-86, 99-101

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/teamhero-interview/SKILL.md` around lines 50 - 52, Add the language
tag "bash" to each CLI code fence in SKILL.md so the fenced blocks containing
commands like "teamhero interview bootstrap", "teamhero interview bootstrap
--headless ...", "teamhero interview grade --candidate ...", and "teamhero
interview cohort --role ..." start with ```bash instead of ``` to satisfy
markdownlint MD040; update every similar CLI fence (including the other
occurrences mentioned) consistently.
tui/interview_bootstrap_form.go-261-267 (1)

261-267: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Treat whitespace-only input as empty in nonEmpty validation.

Inputs like " " currently pass validation and can propagate invalid wizard values.

💡 Proposed fix
 import (
 	"errors"
 	"fmt"
+	"strings"
@@
 func nonEmpty(field string) func(string) error {
 	return func(s string) error {
-		if s == "" {
+		if strings.TrimSpace(s) == "" {
 			return fmt.Errorf("%s is required", field)
 		}
 		return nil
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tui/interview_bootstrap_form.go` around lines 261 - 267, The nonEmpty
validator currently treats whitespace-only strings as valid; update the nonEmpty
function (nonEmpty(field string) func(string) error) to trim whitespace (e.g.,
strings.TrimSpace) from the input before checking emptiness, returning
fmt.Errorf("%s is required", field) when the trimmed value is empty; add the
strings import if missing so the validator rejects inputs like "   ".
docs/2026-05-09-candidate-interview-grader-plan.md-273-273 (1)

273-273: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks.

These unlabeled fences trigger MD040 and reduce syntax highlighting/readability in rendered docs.

Suggested fix pattern
-```
+```text
 ...
-```
+```

Use bash, markdown, typescript, or text as appropriate per block.

Also applies to: 296-296, 528-528, 541-541, 672-672, 681-681, 694-694

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/2026-05-09-candidate-interview-grader-plan.md` at line 273, Several
fenced code blocks in docs/2026-05-09-candidate-interview-grader-plan.md are
unlabeled and trigger MD040; edit each fenced block (lines referenced in the
review: 273, 296, 528, 541, 672, 681, 694) and add an appropriate language
identifier after the opening backticks (e.g., ```bash, ```markdown,
```typescript, or ```text) to enable syntax highlighting—scan the surrounding
content to choose the correct language for each block and replace each opening
"```" with "```<language>" while leaving the closing "```" unchanged.
docs/2026-05-09-candidate-interview-grader-plan.md-533-535 (1)

533-535: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The list-roles/list-candidates status is internally inconsistent.

Line 533-535 labels these verbs as v1.5, while Line 770+ says they are still deferred. Please align one source of truth so readers don’t misinterpret shipped scope.

Also applies to: 770-775

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/2026-05-09-candidate-interview-grader-plan.md` around lines 533 - 535,
The document shows inconsistent status for the commands teamhero interview
list-roles and teamhero interview list-candidates (they're marked v1.5 in one
place but listed as deferred elsewhere); pick the correct status and make both
references consistent: update the v1.5 label if they are still deferred or
move/remove them from the deferred section if they are shipped, editing the
occurrences that mention teamhero interview list-roles and teamhero interview
list-candidates so the "shipped" label and the deferred list (the later block
that currently lists them) match exactly.
scripts/manual-test-interview.sh-62-63 (1)

62-63: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t silently swallow step failures in a smoke script.

Using || true at these execution points can hide real regressions and make the run look successful when core flows failed. Keep the script interactive, but surface non-zero exits explicitly.

Suggested adjustment
+run_step() {
+  if ! "$@"; then
+    local code=$?
+    echo "⚠ Command failed with exit code $code: $*" >&2
+    echo "   Continue only if this failure was expected." >&2
+  fi
+}
+
-"$BIN" interview bootstrap || true
+run_step "$BIN" interview bootstrap
 ...
-"$BIN" interview bootstrap \
+run_step "$BIN" interview bootstrap \
   --headless --no-confirm \
   --role manual-test-role --stack TypeScript --domain Manual \
   --feature "Verify the headless path still works" \
   --mode-project A --mode-analysis human-only --mode-rubric default \
-  --output-dir "$TMPDIR/manual-test-role" || true
+  --output-dir "$TMPDIR/manual-test-role"
 ...
-"$BIN" interview grade \
+run_step "$BIN" interview grade \
   --candidate "Manual Test Candidate" \
   --repo "https://example.com/fake-repo" \
-  --output-dir "$TMPDIR/grade-output" || true
+  --output-dir "$TMPDIR/grade-output"
 ...
-"$BIN" interview cohort --role manual-test-role || true
+run_step "$BIN" interview cohort --role manual-test-role

Also applies to: 68-73, 83-87, 99-99

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/manual-test-interview.sh` around lines 62 - 63, The script is
swallowing errors by appending "|| true" to key steps (e.g. the "$BIN interview
bootstrap" invocation and the other occurrences at the indicated ranges); remove
those "|| true" suffixes and instead check the command exit status: after
invoking "$BIN interview bootstrap" (and the other similar commands), capture
the exit code and if non-zero print a clear error message and either exit the
script or prompt the user to continue (interactive mode) so failures are
surfaced explicitly; look for occurrences of "$BIN" invocations with "|| true"
and replace them with explicit exit-code handling or an interactive confirmation
flow.
tui/interview_bootstrap_wizard.go-262-267 (1)

262-267: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard against nil Options before validation/run.

If a launcher returns Confirmed=true with Options=nil, this path can panic in validation or runner invocation. Add a defensive check and fail cleanly.

Proposed fix
 		if !res.Confirmed {
 			return 0
 		}
 		opts := res.Options
+		if opts == nil {
+			fmt.Fprintln(stderr, "wizard returned no options")
+			return 1
+		}
 		if msg := ValidateBootstrapOptions(opts); msg != "" {
 			fmt.Fprintln(stderr, msg)
 			return 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tui/interview_bootstrap_wizard.go` around lines 262 - 267, The code assumes
res.Options is non-nil before calling ValidateBootstrapOptions and runner.Run;
add a nil check for opts (res.Options) right after assigning opts := res.Options
and before calling ValidateBootstrapOptions: if opts == nil {
fmt.Fprintln(stderr, "no options provided"); return 1 }, so the function fails
cleanly instead of panicking; keep using stderr/stdout and still call
ValidateBootstrapOptions and runner.Run only when opts is non-nil.
tui/interview_grade.go-32-39 (1)

32-39: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle missing flag values before assignment in ParseGradeFlags.

When a flag is followed by another flag token, the parser currently treats that token as a value and returns a misleading later error. Add a guard so the parser returns the correct “requires a value” error immediately.

Proposed fix
 			if i+1 >= len(args) {
 				return nil, fmt.Sprintf("flag %s requires a value", a)
 			}
 			val := args[i+1]
+			if strings.HasPrefix(val, "--") {
+				return nil, fmt.Sprintf("flag %s requires a value", a)
+			}
 			switch a {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tui/interview_grade.go` around lines 32 - 39, In ParseGradeFlags: the current
logic only checks bounds for args[i+1] but can treat a subsequent flag token as
a value; update the guard inside the case handling ("--repo", "--candidate",
"--transcript", "--interviewer-notes", "--session-recording-url",
"--session-platform", "--session-date", "--output-dir", "--local-repo-path") to
not only ensure i+1 < len(args) but also verify args[i+1] is not a flag (e.g.,
starts with "-"); if it is a flag, return the same "flag %s requires a value"
error immediately instead of assigning val and letting later code fail. Locate
this logic in ParseGradeFlags and add the extra check before setting val :=
args[i+1].
tui/interview_cohort.go-24-37 (1)

24-37: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject flag-like tokens as missing values in ParseCohortFlags.

ParseCohortFlags currently accepts another flag token as a value (for example, --role --order ...) and then fails later with a misleading error. Validate the consumed value before assigning it.

Proposed fix
 		case "--role", "--role-dir", "--order":
 			if i+1 >= len(args) {
 				return nil, fmt.Sprintf("flag %s requires a value", a)
 			}
 			val := args[i+1]
+			if strings.HasPrefix(val, "--") {
+				return nil, fmt.Sprintf("flag %s requires a value", a)
+			}
 			switch a {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tui/interview_cohort.go` around lines 24 - 37, In ParseCohortFlags, the code
currently assigns args[i+1] to opts.Role/RoleDir/Order even if that token is
another flag (e.g. "--role --order"), causing confusing errors later; fix it by
validating val before assignment: after val := args[i+1], check if
strings.HasPrefix(val, "-") (or specifically "--") and if so return the same
missing-value error (e.g. fmt.Sprintf("flag %s requires a value", a)); otherwise
proceed to set opts.Role, opts.RoleDir or opts.Order and increment i. Ensure to
import strings if needed.
src/services/interview/cohort/cohort-summary.ts-42-42 (1)

42-42: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use POSIX path joining for markdown links

On Line 42, join(entry, "summary.md") may produce \ on Windows, which can render as broken/awkward links in markdown.

Suggested fix
-import { join } from "node:path";
+import { join, posix } from "node:path";
...
-				summaryPath: join(entry, "summary.md"),
+				summaryPath: posix.join(entry, "summary.md"),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/cohort/cohort-summary.ts` at line 42, The summaryPath
construction uses path.join(entry, "summary.md") which will produce backslashes
on Windows and break markdown links; change it to use POSIX joining instead
(e.g., use path.posix.join or equivalent) when building summaryPath in
cohort-summary.ts so the resulting link always contains forward slashes; update
imports/use of the path module accordingly and ensure the variable name
summaryPath still receives the new POSIX-joined value (reference symbols:
summaryPath, join, entry).
tui/interview_bootstrap.go-127-129 (1)

127-129: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate that --jd-path actually exists in headless mode

Lines 127–129 only require a non-empty value. A typo/path miss will pass validation and later run without JD context.

Suggested fix
 	if opts.ModeRubric == "default+jd" && strings.TrimSpace(opts.JDPath) == "" {
 		return "--mode-rubric 'default+jd' requires --jd-path"
 	}
+	if opts.ModeRubric == "default+jd" {
+		info, err := os.Stat(opts.JDPath)
+		if err != nil || info.IsDir() {
+			return "--jd-path must point to an existing file"
+		}
+	}
 	return ""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tui/interview_bootstrap.go` around lines 127 - 129, The validation in the
condition checking opts.ModeRubric and opts.JDPath only ensures that --jd-path
is not empty, but it should also verify that the specified path actually exists
to prevent runtime errors in headless mode. Update the code to check if the file
at opts.JDPath exists using appropriate file existence checking methods (e.g.,
os.Stat in Go) and return an error if the file does not exist alongside the
current validation in the logic handling the mode-checking.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c9c03249-2c4d-43c6-ac4c-46fe9bc90861

📥 Commits

Reviewing files that changed from the base of the PR and between aba1227 and 106d3ac.

📒 Files selected for processing (74)
  • README.md
  • docs/2026-05-09-candidate-interview-grader-plan.md
  • docs/interview-classification-rationale.md
  • docs/interview-rubric.md
  • scripts/manual-test-interview.sh
  • scripts/run-interview-bootstrap.ts
  • scripts/run-interview-cohort.ts
  • scripts/run-interview-grade.ts
  • scripts/run-tests.sh
  • skills/teamhero-interview/SKILL.md
  • src/cli/index.ts
  • src/services/interview/assess/ai-observer.ts
  • src/services/interview/assess/audit-writer.ts
  • src/services/interview/assess/collectors/asciinema.ts
  • src/services/interview/assess/collectors/git-history.ts
  • src/services/interview/assess/collectors/jsonl-log.ts
  • src/services/interview/assess/collectors/transcript.ts
  • src/services/interview/assess/extractors/risk-awareness.ts
  • src/services/interview/assess/extractors/test-pass.ts
  • src/services/interview/assess/extractors/throughput.ts
  • src/services/interview/assess/extractors/verification.ts
  • src/services/interview/assess/grade-orchestrator.ts
  • src/services/interview/assess/types.ts
  • src/services/interview/bootstrap/openai-generator-client.ts
  • src/services/interview/bootstrap/orchestrator.ts
  • src/services/interview/bootstrap/project-generator.ts
  • src/services/interview/bootstrap/project-validator.ts
  • src/services/interview/bootstrap/role-config.ts
  • src/services/interview/cohort/cohort-summary.ts
  • src/services/interview/shared/events.ts
  • src/services/interview/shared/rubric.ts
  • teamhero-interview-kit/.claude/CLAUDE.md
  • teamhero-interview-kit/.claude/settings.json
  • teamhero-interview-kit/INTERVIEW_RULES.md
  • teamhero-interview-kit/PRIVACY_RELEASE.md
  • teamhero-interview-kit/RUBRIC_OVERVIEW.md
  • teamhero-interview-kit/end.sh
  • teamhero-interview-kit/lib/privacy-gate.sh
  • teamhero-interview-kit/start.sh
  • tests/unit/cli/interview.spec.ts
  • tests/unit/services/interview/assess/ai-observer.spec.ts
  • tests/unit/services/interview/assess/audit-writer.spec.ts
  • tests/unit/services/interview/assess/collectors.spec.ts
  • tests/unit/services/interview/assess/extractors.spec.ts
  • tests/unit/services/interview/assess/grade-orchestrator.spec.ts
  • tests/unit/services/interview/bootstrap/orchestrator.spec.ts
  • tests/unit/services/interview/bootstrap/project-generator.spec.ts
  • tests/unit/services/interview/bootstrap/project-validator.spec.ts
  • tests/unit/services/interview/bootstrap/role-config.spec.ts
  • tests/unit/services/interview/cohort/cohort-summary.spec.ts
  • tests/unit/services/interview/cohort/skill.spec.ts
  • tests/unit/services/interview/kit/kit-smoke.spec.ts
  • tests/unit/services/interview/kit/privacy-gate.spec.ts
  • tests/unit/services/interview/shared/events.spec.ts
  • tests/unit/services/interview/shared/rubric.spec.ts
  • tui/interview.go
  • tui/interview_bootstrap.go
  • tui/interview_bootstrap_form.go
  • tui/interview_bootstrap_test.go
  • tui/interview_bootstrap_wizard.go
  • tui/interview_bootstrap_wizard_test.go
  • tui/interview_cohort.go
  • tui/interview_cohort_test.go
  • tui/interview_grade.go
  • tui/interview_grade_test.go
  • tui/interview_preview.go
  • tui/interview_preview_test.go
  • tui/interview_progress.go
  • tui/interview_progress_test.go
  • tui/interview_styles.go
  • tui/interview_styles_test.go
  • tui/interview_test.go
  • tui/main.go
  • tui/manual_test_script_test.go

Comment thread src/services/interview/review/review-orchestrator.ts Outdated
Comment thread src/services/interview/bootstrap/openai-generator-client.ts
Comment thread src/services/interview/bootstrap/project-generator.ts
Comment thread src/services/interview/bootstrap/project-generator.ts
asabaylus and others added 3 commits May 10, 2026 22:38
Enables auto-review for PRs targeting main, which covers PRs opened
from claude/* branches. CodeRabbit's schema has no source-branch
filter, so this is the closest we can get to "auto-review Claude PRs."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'grade' verb implied scoring against a rubric, which conflicted with
the project's observations-not-scores ethics floor. Renamed the CLI verb,
file/directory names, type names, docs, and tests to use 'review'
instead. Includes:

- CLI: teamhero interview grade → teamhero interview review
- Go TUI: tui/interview_grade*.go → tui/interview_review*.go, types
  GradeOptions/GradeRunner → ReviewOptions/ReviewRunner
- TS service: src/services/interview/assess/ → review/, orchestrator
  renamed; GradeResult/GradeInput/GradeOutcome → Review*
- Subprocess: scripts/run-interview-grade.ts → run-interview-review.ts
- Tests, SKILL.md, README, manual test script, planning doc

Also adds a defensive guard to the TS CLI runner: unknown subcommands
now print an error and exit 1 instead of silently falling through to
commander's top-level help. That silent fallthrough is what made a stale
dist look like a missing-command bug when 'teamhero interview' first
shipped here.

No backward-compat shims — pre-deployment, no external callers.

Closes teamhero-scripts-td7, teamhero-scripts-zva.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "skips all roadmap audit contexts when TEAMHERO_ROADMAP_AUDIT_ENABLED=0"
test failed under non-isolated `bun test` because other specs (ai-service,
individual-summarizer, report-service) install a `mock.module("src/lib/env.js")`
stub whose `getEnv` ignores `process.env`. `mock.module()` registrations leak
across files, so when this file ran after one of them, setting
`process.env.TEAMHERO_ROADMAP_AUDIT_ENABLED = "0"` had no effect on the gate.

Re-pin `getEnv` to a real implementation at the top of this spec so it works
regardless of which file ran before it. CI (per-file isolated runner) was
already green; this just fixes the fast `bun test` workflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

♻️ Duplicate comments (1)
src/services/interview/review/review-orchestrator.ts (1)

45-47: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Use execFileSync/arg array to remove command-injection risk.

Line 45 builds a shell command with interpolated repoUrl/destDir; this allows shell-breaking payloads. Use argument-array execution instead.

Suggested fix
-import { execSync } from "node:child_process";
+import { execFileSync } from "node:child_process";
...
 const defaultCloner: Cloner = (repoUrl, destDir) => {
-	execSync(`git clone --depth=50 '${repoUrl}' '${destDir}'`, {
+	execFileSync("git", ["clone", "--depth=50", repoUrl, destDir], {
 		stdio: "inherit",
 	});
 };
#!/bin/bash
# Verify unsafe shell-interpolated exec usage in the reviewed file.
rg -nP 'execSync\s*\(\s*`git clone' src/services/interview/review/review-orchestrator.ts
rg -nP '\bexecSync\s*\(' src/services/interview/review/review-orchestrator.ts
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/review/review-orchestrator.ts` around lines 45 - 47,
The git clone call in review-orchestrator.ts uses execSync with a
shell-interpolated command (execSync(`git clone --depth=50 '${repoUrl}'
'${destDir}'`, ...)) which is vulnerable to command injection; replace it with a
non-shell, argument-array style call (e.g., execFileSync or spawnSync) passing
["git", "clone", "--depth=50", repoUrl, destDir] and preserve stdio: "inherit"
semantics; ensure you remove the shell=true behavior, pass repoUrl and destDir
as separate args, and adjust any error handling around the exec in the
function/method that contains this execSync invocation.
🧹 Nitpick comments (1)
tests/unit/cli/unknown-subcommand.spec.ts (1)

75-100: ⚡ Quick win

Known-subcommand guard test is slightly environment-dependent.

Line 90 drives run(..., sub, "--help") into TUI passthrough, so this “guard-stage” unit test can depend on local binary/runtime state. Consider isolating guard logic from TUI execution in this test path to keep it fully deterministic.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/cli/unknown-subcommand.spec.ts` around lines 75 - 100, The test is
hitting TUI passthrough when calling run([... sub, "--help"]) making it
environment-dependent; modify the test to isolate the guard by stubbing out the
TUI execution in the deps returned by makeDeps (e.g., set deps.spawnTui to a
no-op or a function that throws a deterministic sentinel) so run() exercises
only the guard logic and never invokes the real TUI, keep the existing
errorSpy/exitSpy handling and assertions around run and unknown-subcommand
messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/2026-05-09-candidate-interview-reviewer-plan.md`:
- Line 273: The markdown contains unannotated fenced code blocks (``` ) that
trigger MD040; update each fenced block to include a language identifier (e.g.,
```bash, ```text, ```markdown) so tooling can correctly lint and highlight them.
Locate every triple-backtick fence in the document (all occurrences of ``` ) and
replace the opening fence with an appropriate language token consistent across
similar examples (use bash for shell commands, text for plain output, markdown
for nested markdown examples, etc.), and ensure the same convention is applied
to the other occurrences noted in the review.
- Line 757: Update the stale module path in the implementation progress table:
replace the incorrect `src/services/interview/{bootstrap,assess,cohort,shared}/`
reference with the correct module path that uses `review` (e.g.,
`src/services/interview/{bootstrap,assess,cohort,review}/`) so it matches the
rest of the document and the shipped flow; ensure the `teamhero interview` CLI
subcommand line and any adjacent references use the `review` segment
consistently.

In `@scripts/manual-test-interview.sh`:
- Around line 83-86: The review invocation using "$BIN" interview review
currently omits the analysis mode flag so it may call OpenAI; modify the command
calling "$BIN" interview review (the block with --candidate and --repo) to
include --mode-analysis human-only so the script enforces human-only analysis
and avoids any API usage/cost paths.

In `@scripts/run-interview-review.ts`:
- Around line 53-56: The CLI currently only checks that at least one of
flags.repo or flags.localRepoPath is provided, but allows both; change the check
to require exactly one (XOR). Replace the existing if-condition with a test that
fails when both are present or both absent (e.g. if ((!!flags.repo) ===
(!!flags.localRepoPath)) { ... }), and update the error message to state
"Provide exactly one of --repo <url> or --local-repo-path <dir>" before calling
process.exit(1); keep references to flags.repo and flags.localRepoPath so
reviewers can find the change.

In `@skills/teamhero-interview/SKILL.md`:
- Around line 78-86: The heading "teamhero interview review <repo-url>" is
inconsistent with the actual CLI invocation which uses flags; update the
SKILL.md heading to match the documented usage by changing it to something like
"teamhero interview review --repo <url>" (or show both forms: positional and
flag) and ensure the examples and paragraph describe the same form; edit the
heading text around the `teamhero interview review` section and any nearby
examples so they reference the `--repo <url>` flag (and other flags)
consistently with the code examples and the documented invocation.
- Around line 50-52: Add the missing fenced-code language identifiers by
changing the triple-backtick fences for the command snippets to specify "bash"
(e.g., the blocks containing "teamhero interview bootstrap", "teamhero interview
bootstrap --headless \", "teamhero interview review --candidate \"Jane Doe\"
--repo <url> \", "teamhero interview cohort --role <slug> [--order
alphabetical|chronological]" and the other similar command blocks at lines
referenced) so each code fence begins with ```bash to satisfy markdownlint
MD040; update every occurrence noted (around lines 50-52, 63-72, 81-86, 99-101)
consistently.

In `@src/cli/index.ts`:
- Around line 229-255: The nested-subcommand rejection logic was left using a
hardcoded allowlist (e.g., only "doctor"/"setup") so "teamhero report interview"
bypasses the guard; update the nested-subcommand check to reuse the shared
KNOWN_SUBCOMMANDS constant instead of the ad-hoc array: locate the guard in the
run function that inspects the first/second args and replace the hardcoded
allowed-subcommands with KNOWN_SUBCOMMANDS (so validations for
first/second-level subcommands remain consistent).

In `@src/services/interview/review/ai-observer.ts`:
- Around line 192-201: The current rejectIfScored() string-scans JSON which
false-positives when forbidden names appear inside string values; instead
parse/inspect the response structurally and recursively check object keys
against FORBIDDEN_FIELDS. Update rejectIfScored to: safely parse or coerce the
incoming response to an object (handle non-objects by returning), walk nested
objects and arrays, and if any object has a property name equal to one of
FORBIDDEN_FIELDS (use direct key comparison, not regex), throw the existing
Error message; also catch and rethrow/handle JSON parse errors consistently so
you don't silently ignore malformed input.

In `@src/services/interview/review/audit-writer.ts`:
- Around line 53-76: The yaml function is vulnerable because it interpolates
user-adjacent strings (AuditFrontmatter fields like candidate, role, tags,
recommendation, session_* and session URLs) directly into YAML, allowing invalid
frontmatter or injected keys; fix by switching from manual string interpolation
to safe YAML serialization (e.g., use a YAML library like js-yaml oryaml and
call safeDump/safeLoad for the AuditFrontmatter object) or, if you must
hand-roll it, ensure every scalar is properly quoted and escaped (escape
backslashes, quotes and newlines, emit tags as a YAML sequence not an unquoted
bracket list, and only emit optional keys when present). Update the yaml
function to construct a plain JS object from AuditFrontmatter and pass it to the
serializer (or replace each push with properly quoted/escaped values and tags
emitted as a sequence) so all fields are safely encoded.

In `@src/services/interview/review/collectors/asciinema.ts`:
- Around line 67-86: The code in the event-processing block for evt.data assumes
a single-character payload and misses cases where evt.data contains multiple
characters (e.g., "npm test\r"); update the logic in the handler around
variables buffer, lastKeyDelta, evt.delta, baseEpoch and commands so you iterate
over each character in evt.data (for example for (const ch of evt.data)) and
apply the same branches: treat '\r' and '\n' as Enter (trim buffer, push a
command with timestamp = isoFromUnix(baseEpoch + evt.delta) and
pauseSecondsBeforeEnter = Math.max(0, evt.delta - lastKeyDelta)), treat '�' and
'\b' as backspace (slice buffer and set lastKeyDelta), treat printable chars
(charCode >= 32 or '\t') as append to buffer and set lastKeyDelta; ensure buffer
is reset after pushing a command so subsequent characters are processed
correctly.

In `@src/services/interview/review/collectors/git-history.ts`:
- Around line 14-18: The realRunner implementation currently builds a shell
command string and calls execSync (risking injection/quoting bugs); change it to
call execFileSync with the program "git" and pass the args array directly (i.e.
execFileSync("git", args, { cwd, encoding: "utf8" })) so you remove the manual
quoting/map and avoid shell parsing; update any type/return handling in the
GitRunner wrapper accordingly.

In `@src/services/interview/review/collectors/transcript.ts`:
- Around line 27-37: Validate the sessionStartIso at the start of toIsoFromHMS:
parse it (e.g., new Date(sessionStartIso)) and check
isFinite(date.getTime())/isNaN before using getTime(); if invalid, throw a clear
Error mentioning "invalid sessionStartIso" and include the raw value. Update the
same validation pattern where sessionStartIso is used later in this file (the
second occurrence around lines 51-52) so both code paths fail fast with the same
clear error message; reference function toIsoFromHMS and the other usage site to
locate and mirror the change.

In `@src/services/interview/review/extractors/test-pass.ts`:
- Around line 20-34: The spawnSync invocations in test-pass.ts that run the Go
and Bun tests (the calls to spawnSync("go", ["test", "./..."], { cwd: repoDir,
encoding: "utf8" }) and spawnSync("bun", ["test"], { cwd: repoDir, encoding:
"utf8" })) lack timeout and maxBuffer safeguards; update both calls to include
sensible timeout and maxBuffer options (e.g., timeout: 120000, maxBuffer: 10 *
1024 * 1024) so the subprocesses cannot hang or overflow stdout/stderr, keep the
rest of the options (cwd, encoding) and ensure the rest of the code that reads
r.stdout/r.stderr and computes passed/failed via countMatches remains unchanged.

In `@src/services/interview/review/extractors/verification.ts`:
- Around line 58-71: The interleaving count currently treats any adjacent
prompt→command pair in the merged array as "within 30 seconds" but doesn't
enforce the 30s boundary; update the loop that computes interleavedAfterPrompt
(which iterates merged, uses cur/prev and checks cur.type === "command" &&
prev.type === "prompt") to also compute the timestamp difference (e.g., parse
cur.timestamp and prev.timestamp to milliseconds via Date) and only increment
interleavedAfterPrompt when the difference is <= 30000 ms; keep the existing
sort by timestamp and ensure you handle timestamp parsing robustly (use
getTime() or valueOf()) so only prompt→test pairs within the 30-second window
are counted.

In `@src/services/interview/review/review-orchestrator.ts`:
- Around line 83-85: Replace the ad-hoc date logic: remove todayIso() and any
manual string concatenation like `${sessionDate}T09:00:00Z`; instead import and
use the date-utils helpers (resolveStartISO, resolveEndISO, resolveEndEpochMs,
formatDateUTC) so timezone-safe boundaries and display formatting are
consistent; update the places referenced (the todayIso function and the
constructions around sessionDate at the other locations) to call
resolveStartISO/resolveEndISO for interval boundaries and formatDateUTC for any
display strings or ISO formatting.
- Around line 39-42: The Cloner and ReviewDependencies types are port interfaces
and must be moved into the shared core types file; remove their local
declarations (e.g., the Cloner type and the ReviewDependencies type block) and
declare them in src/core/types.ts instead, then import and use those types in
review-orchestrator.ts (and any other files that used the local definitions) so
all boundary port interfaces live in the central types module and usages
reference the imported Cloner and ReviewDependencies symbols.

In `@tests/unit/services/interview/cohort/cohort-summary.spec.ts`:
- Around line 96-109: The test currently asserts status words only; instead,
assert the actual glyph characters so the UI signal can't be removed silently:
update the test using the existing helpers (makeCandidate, loadCohort,
renderCohortSummary) to check for the exact glyphs "✅" for reviewed Alice and
"⏳" for pending Bob (e.g. expect(body).toMatch(/Alice.*✅/) and
expect(body).toMatch(/Bob.*⏳/)), while retaining the existing checks for
"Reviewed"/"Pending" and "Hire" to keep coverage of both glyph and text.

In `@tests/unit/services/interview/review/ai-observer.spec.ts`:
- Around line 123-132: The test currently only checks
OBSERVATION_RESPONSE_SCHEMA.additionalProperties at the top level; add a
recursive schema traversal (e.g., a helper named assertNoAdditionalProperties or
traverseSchema) that walks OBSERVATION_RESPONSE_SCHEMA and asserts
additionalProperties === false for every object schema node it encounters
(inspect properties, patternProperties, additionalProperties, items,
allOf/anyOf/oneOf, not, definitions/$defs, etc.), and also ensure the banned
field names ("score","weighted_total","raw_total","band","signal_count") are not
present anywhere by checking keys at each visited node; call this helper from
the "declares strict json_schema..." test to harden the assertion.

---

Duplicate comments:
In `@src/services/interview/review/review-orchestrator.ts`:
- Around line 45-47: The git clone call in review-orchestrator.ts uses execSync
with a shell-interpolated command (execSync(`git clone --depth=50 '${repoUrl}'
'${destDir}'`, ...)) which is vulnerable to command injection; replace it with a
non-shell, argument-array style call (e.g., execFileSync or spawnSync) passing
["git", "clone", "--depth=50", repoUrl, destDir] and preserve stdio: "inherit"
semantics; ensure you remove the shell=true behavior, pass repoUrl and destDir
as separate args, and adjust any error handling around the exec in the
function/method that contains this execSync invocation.

---

Nitpick comments:
In `@tests/unit/cli/unknown-subcommand.spec.ts`:
- Around line 75-100: The test is hitting TUI passthrough when calling run([...
sub, "--help"]) making it environment-dependent; modify the test to isolate the
guard by stubbing out the TUI execution in the deps returned by makeDeps (e.g.,
set deps.spawnTui to a no-op or a function that throws a deterministic sentinel)
so run() exercises only the guard logic and never invokes the real TUI, keep the
existing errorSpy/exitSpy handling and assertions around run and
unknown-subcommand messages.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: f1afff1b-ff4d-4824-b702-affafb49c191

📥 Commits

Reviewing files that changed from the base of the PR and between 106d3ac and ffe507a.

📒 Files selected for processing (37)
  • .coderabbit.yaml
  • README.md
  • docs/2026-05-09-candidate-interview-reviewer-plan.md
  • scripts/manual-test-interview.sh
  • scripts/run-interview-review.ts
  • skills/teamhero-interview/SKILL.md
  • src/cli/index.ts
  • src/services/interview/bootstrap/openai-generator-client.ts
  • src/services/interview/review/ai-observer.ts
  • src/services/interview/review/audit-writer.ts
  • src/services/interview/review/collectors/asciinema.ts
  • src/services/interview/review/collectors/git-history.ts
  • src/services/interview/review/collectors/jsonl-log.ts
  • src/services/interview/review/collectors/transcript.ts
  • src/services/interview/review/extractors/risk-awareness.ts
  • src/services/interview/review/extractors/test-pass.ts
  • src/services/interview/review/extractors/throughput.ts
  • src/services/interview/review/extractors/verification.ts
  • src/services/interview/review/review-orchestrator.ts
  • src/services/interview/review/types.ts
  • tests/unit/cli/unknown-subcommand.spec.ts
  • tests/unit/services/discrepancy.spec.ts
  • tests/unit/services/interview/cohort/cohort-summary.spec.ts
  • tests/unit/services/interview/cohort/skill.spec.ts
  • tests/unit/services/interview/review/ai-observer.spec.ts
  • tests/unit/services/interview/review/audit-writer.spec.ts
  • tests/unit/services/interview/review/collectors.spec.ts
  • tests/unit/services/interview/review/extractors.spec.ts
  • tests/unit/services/interview/review/review-orchestrator.spec.ts
  • tui/interview.go
  • tui/interview_progress.go
  • tui/interview_review.go
  • tui/interview_review_test.go
  • tui/interview_styles.go
  • tui/interview_test.go
  • tui/main.go
  • tui/manual_test_script_test.go
✅ Files skipped from review due to trivial changes (3)
  • .coderabbit.yaml
  • README.md
  • tests/unit/services/interview/review/extractors.spec.ts
🚧 Files skipped from review as they are similar to previous changes (4)
  • tui/main.go
  • tests/unit/services/interview/cohort/skill.spec.ts
  • tui/interview_styles.go
  • tui/interview_progress.go

Comment thread docs/2026-05-09-candidate-interview-reviewer-plan.md
Comment thread docs/2026-05-09-candidate-interview-reviewer-plan.md
Comment thread scripts/manual-test-interview.sh
Comment thread scripts/run-interview-review.ts
Comment thread skills/teamhero-interview/SKILL.md
Comment thread src/services/interview/review/extractors/verification.ts Outdated
Comment thread src/services/interview/review/review-orchestrator.ts
Comment thread src/services/interview/review/review-orchestrator.ts
Comment thread tests/unit/services/interview/cohort/cohort-summary.spec.ts
Comment thread tests/unit/services/interview/review/ai-observer.spec.ts
Security and data-loss:
- review-orchestrator: replace shell-interpolated `git clone` with
  execFileSync so a candidate repo URL containing single quotes or
  shell metacharacters cannot inject commands.
- git-history collector: same — switch execSync(`git ${args}`) to
  execFileSync("git", args).
- project-generator.writeGenerated: refuse LLM-supplied file paths
  that are absolute or that resolve outside outputDir (path traversal
  via `../` or absolute paths in the generator response).
- project-generator.clearOutputDir: refuse to recursively remove the
  filesystem root, the user's $HOME, or any root-level path. A
  misconfigured outputDir of "/" or "$HOME" otherwise wipes user data.
- review-orchestrator: candidate_id now includes a UTC HHMMSS suffix
  so two reviews of the same candidate on the same day land in
  distinct directories instead of silently overwriting each other.

Trust-boundary validation:
- role-config.readRoleConfig: parse failures and shape mismatches now
  throw a descriptive error instead of letting malformed JSON
  propagate as untyped data into the orchestrator.
- cohort-summary.loadCohort: validate that audit.json has the
  required frontmatter shape; skip malformed records instead of
  crashing renderRow with `undefined`.
- audit-writer.yaml: quote scalar values that contain colons, hashes,
  newlines, etc. A candidate name with `:` or `#` previously corrupted
  the YAML frontmatter, breaking the audit-writer ↔ cohort-summary
  round-trip.
- shared/events.parseInterviewEvent: validate `step` / `status` /
  `message` / `progress` fields on `progress` events instead of
  trusting any object whose `type` field is "progress".
- run-interview-bootstrap: validate `--time-box` is a finite number
  and `--mode-*` flags are from the allowed enum before forwarding
  into RoleConfig; previously NaN/garbage leaked through and failed
  with confusing downstream messages.
- TUI bootstrap: when `--mode-rubric=default+jd`, also check that the
  `--jd-path` file actually exists, not just that it's non-empty.

Correctness / hang / data-loss in collectors:
- jsonl-log: gracefully return empty when interview.log is missing
  instead of throwing ENOENT and failing the whole review run.
- transcript: broaden the bare-speaker regex to accept "Speaker 1"
  style labels; fall back to the epoch if sessionStartIso parses
  to NaN instead of crashing toISOString().
- verification: actually apply the documented 30s prompt-to-test
  window using timestamp deltas, not stream adjacency.
- test-pass: cap candidate test runs with a 5-minute SIGTERM timeout
  and surface r.error (ETIMEDOUT, ENOENT, …) in the output so a hung
  candidate process can't block the grader indefinitely.
- project-validator.walk: use lstatSync and skip symlinks so an
  attacker-controlled generator output with a cyclic symlink can't
  hang the validator.

Error contracts and TUI:
- bootstrap/orchestrator: wrap generation + writeRoleConfig in
  try/catch so unexpected throws surface as a structured
  RunBootstrapResult failure instead of an unhandled rejection at
  the CLI boundary.
- run-interview-bootstrap / run-interview-review: drop dotenv
  `{override: true}` so CI-provided env vars (real OPENAI_API_KEY)
  win over anything in a local .env.
- TUI wizard dispatcher: TTY-check stdin before launching the
  interactive huh.Form (no flags + piped stdin previously hung
  forever); add a nil-Options guard before the runner call.
- TUI progress model: an "update" event now promotes the phase from
  pending → running so the UI reflects activity even when an emitter
  skips the matching "start".
- TUI bootstrap form `nonEmpty`: trim whitespace so a value of
  spaces-only is rejected rather than treated as set.

Hygiene:
- privacy-gate.sh: move `set -u` inside the direct-exec guard so
  sourcing the library doesn't leak nounset into the caller's shell.
- start.sh: write .interview-state/started-at AFTER the asciinema
  preflight succeeds, so a failed start doesn't leave a stale marker
  that end.sh would mistake for a completed session.

Test fixes:
- cohort-summary: import AuditFrontmatter from the renamed `review/`
  module (was still pointing at the deleted `assess/` path; tsc
  caught it).
- project-validator spec fixture: stop adding +5 to the expected LOC
  for failing-test files — test files are excluded from the LOC count
  by the validator's TEST_NAME_PATTERN.
- audit-writer + review-orchestrator specs: expect quoted YAML URLs
  now that the YAML emitter quotes scalars containing colons.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Note

Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
src/services/interview/bootstrap/project-generator.ts (1)

95-99: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Top-level delete guard is ineffective for paths like /usr and /etc.

Line 98 checks parent === abs, which only catches filesystem roots. It does not block top-level directories, so clearOutputDir() can still recursively delete unsafe system paths if misconfigured.

🔧 Suggested fix
-import { dirname, isAbsolute, join, relative, resolve, sep } from "node:path";
+import { dirname, isAbsolute, join, parse, relative, resolve, sep } from "node:path";
...
 function assertSafeToClear(outputDir: string): void {
 	const abs = resolve(outputDir);
-	const root = resolve(abs, "/");
+	const root = parse(abs).root;
 	if (abs === root || abs === sep) {
 		throw new Error(`Refusing to clear filesystem root: ${abs}`);
 	}
 	const home = homedir();
 	if (home && abs === resolve(home)) {
 		throw new Error(`Refusing to clear home directory: ${abs}`);
 	}
-	// Refuse a top-level path like /usr, /etc, /home — anything where the
-	// path has no parent beyond the root.
-	const parent = dirname(abs);
-	if (parent === abs) {
-		throw new Error(`Refusing to clear root-level path: ${abs}`);
+	// Refuse top-level paths like /usr, /etc, /home.
+	if (dirname(abs) === root) {
+		throw new Error(`Refusing to clear top-level path: ${abs}`);
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/bootstrap/project-generator.ts` around lines 95 - 99,
The current check in clearOutputDir uses dirname(abs) and only throws when
parent === abs, which misses top-level children like "/usr" (whose parent is "/"
not equal to abs); update the guard to also detect paths directly under the
filesystem root by comparing parent to the path root: require
path.parse(abs).root and throw if parent === abs || parent ===
path.parse(abs).root; reference the clearOutputDir function and the local
variables abs and parent and use path.parse(abs).root to identify root-level
parents.
🧹 Nitpick comments (1)
src/services/interview/review/extractors/test-pass.ts (1)

14-17: ⚡ Quick win

Move TestRunner port type to src/core/types.ts.

This defines a new port-like interface locally; per repo architecture rules, keep port interfaces centralized in core types and import it here.

As per coding guidelines, "ALL port interfaces must be defined in src/core/types.ts — never create new types files".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/interview/review/extractors/test-pass.ts` around lines 14 - 17,
The type alias TestRunner is a port interface that must be centralized instead
of being defined locally; remove the local export of TestRunner and instead add
the identical exported type declaration to the core types module (types.ts),
then import { TestRunner } into this file and use it in place of the removed
local type; ensure the signature stays { readonly passed: number; readonly
failed: number; readonly output: string } and that the core types module exports
it for other modules to consume.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/services/interview/bootstrap/project-generator.ts`:
- Around line 27-33: Move the GeneratorClient interface declaration into the
shared core types module (the central core types file) so it lives with other
port interfaces; export it from there and remove the local declaration in
project-generator.ts. Update all files that reference GeneratorClient (and
ensure GeneratedProject and RoleConfig are imported or re-exported from the core
types file) to import the interface from the shared core types module instead of
the local service file. Ensure the generate method signature and readonly
modifiers remain identical when relocating.
- Around line 109-120: The copyDir function currently uses statSync(s) which
follows symlinks; change it to use lstatSync(s) and explicitly skip entries
where lstatSync(s).isSymbolicLink() to avoid copying files outside
kitTemplateDir or entering symlink loops. In function copyDir, replace statSync
with lstatSync, add a guard like "if (st.isSymbolicLink()) continue", and keep
the existing directory handling (mkdirSync + recursive copy) and file writing
(mkdirSync(dirname(d)) + writeFileSync) for non-symlink entries.

In `@src/services/interview/review/audit-writer.ts`:
- Around line 57-68: The yamlScalar function detects values needing quoting but
only escapes backslashes and double quotes, so control characters like newline,
carriage return and tab remain raw and produce invalid YAML; update yamlScalar
to additionally replace control characters (e.g. \n, \r, \t, \b, \f, and any
other non-printable controls you expect) with their corresponding escape
sequences before wrapping in quotes (while still escaping backslashes and double
quotes), ensuring the returned double-quoted scalar contains valid YAML escape
sequences; modify the replacement logic used when building the escaped variable
in yamlScalar to run a single pass that maps those control characters to "\\n",
"\\r", "\\t", "\\b", "\\f", etc., then return the quoted string.
- Around line 70-76: yamlTag currently detects whitespace (including
newlines/tabs) but only escapes backslashes and double quotes; update yamlTag to
additionally replace control whitespace characters with proper escape sequences
before wrapping in quotes: after computing escaped = value.replace(/\\/g,
"\\\\").replace(/"/g, '\\"'), also replace newline (\n) with "\\n", carriage
return (\r) with "\\r", and tab (\t) with "\\t" (and any other control chars you
deem necessary) so the returned quoted string is fully escaped; keep the
existing regex and return logic but perform these extra replacements on the
escaped variable inside the yamlTag function.

In `@src/services/interview/review/collectors/git-history.ts`:
- Around line 27-31: The current try/catch around runner(["log",
`--format=${FORMAT}`, "--numstat"], repoDir) swallows all errors and returns [];
change it to only convert known benign git cases (e.g., "does not have any
commits", "unknown revision", or specific exit codes from git that indicate
empty history) into [], and rethrow any other exceptions so real failures
(permissions, broken git, unexpected runtime errors) surface; specifically wrap
the call to runner in a try, inspect the thrown error's message/exit code/stderr
and return [] only for the recognized benign messages, otherwise throw the
original error so higher-level handlers (in-report placeholder / appendix / CLI
stderr) can handle it.

---

Duplicate comments:
In `@src/services/interview/bootstrap/project-generator.ts`:
- Around line 95-99: The current check in clearOutputDir uses dirname(abs) and
only throws when parent === abs, which misses top-level children like "/usr"
(whose parent is "/" not equal to abs); update the guard to also detect paths
directly under the filesystem root by comparing parent to the path root: require
path.parse(abs).root and throw if parent === abs || parent ===
path.parse(abs).root; reference the clearOutputDir function and the local
variables abs and parent and use path.parse(abs).root to identify root-level
parents.

---

Nitpick comments:
In `@src/services/interview/review/extractors/test-pass.ts`:
- Around line 14-17: The type alias TestRunner is a port interface that must be
centralized instead of being defined locally; remove the local export of
TestRunner and instead add the identical exported type declaration to the core
types module (types.ts), then import { TestRunner } into this file and use it in
place of the removed local type; ensure the signature stays { readonly passed:
number; readonly failed: number; readonly output: string } and that the core
types module exports it for other modules to consume.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 20c0ad49-c16c-47ab-bd28-dd3a3617b024

📥 Commits

Reviewing files that changed from the base of the PR and between ffe507a and 790b9e8.

📒 Files selected for processing (25)
  • scripts/run-interview-bootstrap.ts
  • scripts/run-interview-review.ts
  • src/services/interview/bootstrap/orchestrator.ts
  • src/services/interview/bootstrap/project-generator.ts
  • src/services/interview/bootstrap/project-validator.ts
  • src/services/interview/bootstrap/role-config.ts
  • src/services/interview/cohort/cohort-summary.ts
  • src/services/interview/review/audit-writer.ts
  • src/services/interview/review/collectors/git-history.ts
  • src/services/interview/review/collectors/jsonl-log.ts
  • src/services/interview/review/collectors/transcript.ts
  • src/services/interview/review/extractors/test-pass.ts
  • src/services/interview/review/extractors/verification.ts
  • src/services/interview/review/review-orchestrator.ts
  • src/services/interview/shared/events.ts
  • teamhero-interview-kit/lib/privacy-gate.sh
  • teamhero-interview-kit/start.sh
  • tests/unit/services/interview/bootstrap/project-validator.spec.ts
  • tests/unit/services/interview/review/audit-writer.spec.ts
  • tests/unit/services/interview/review/review-orchestrator.spec.ts
  • tui/interview_bootstrap.go
  • tui/interview_bootstrap_form.go
  • tui/interview_bootstrap_wizard.go
  • tui/interview_bootstrap_wizard_test.go
  • tui/interview_progress.go
🚧 Files skipped from review as they are similar to previous changes (17)
  • src/services/interview/shared/events.ts
  • src/services/interview/review/collectors/jsonl-log.ts
  • src/services/interview/review/collectors/transcript.ts
  • tests/unit/services/interview/bootstrap/project-validator.spec.ts
  • scripts/run-interview-review.ts
  • teamhero-interview-kit/lib/privacy-gate.sh
  • tests/unit/services/interview/review/review-orchestrator.spec.ts
  • tests/unit/services/interview/review/audit-writer.spec.ts
  • tui/interview_progress.go
  • tui/interview_bootstrap_wizard.go
  • src/services/interview/bootstrap/role-config.ts
  • tui/interview_bootstrap.go
  • src/services/interview/cohort/cohort-summary.ts
  • tui/interview_bootstrap_form.go
  • tui/interview_bootstrap_wizard_test.go
  • src/services/interview/bootstrap/project-validator.ts
  • src/services/interview/review/review-orchestrator.ts

Comment thread src/services/interview/bootstrap/project-generator.ts
Comment thread src/services/interview/bootstrap/project-generator.ts Outdated
Comment thread src/services/interview/review/audit-writer.ts
Comment thread src/services/interview/review/audit-writer.ts
Comment thread src/services/interview/review/collectors/git-history.ts
asabaylus and others added 2 commits May 10, 2026 23:13
Without `[args...]` on the command spec, commander rejected any
positional argument as "too many arguments" — so
`teamhero interview bootstrap`, `teamhero report --since ...`, etc.
all failed before reaching the Go TUI. The `.allowUnknownOption()`
flag only covers `--flags`, not positionals.

Declaring each command as `command [args...]` makes commander accept
variadic positionals which we then forward verbatim to the TUI
binary. No call-site changes needed; the action callbacks already
re-derive args from process.argv.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

✅ Unit tests committed locally. Commit: cf8229bec27921480f52ef5cbb9385964b7e1eb0

- audit-writer: escape literal newlines, carriage returns, and tabs
  when emitting double-quoted YAML scalars. A candidate name with an
  embedded newline previously made the frontmatter invalid YAML even
  with the surrounding quotes.
- cli/index.ts: add "interview" to the report-action's nested-
  subcommand guard so `teamhero report interview` is rejected
  consistently with `report doctor` / `report setup`, instead of
  silently routing into the report flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts (3)

156-173: ⚡ Quick win

Chronological test does not actually prove session_date precedence.

Right now date and session_date are identical for both candidates, so sorting by date would also pass. Make them intentionally conflict so this test catches regressions.

Suggested test hardening
 			makeCandidate(dir, "carol", {
 				candidate: "Carol",
-				date: "2026-05-20",
+				date: "2026-04-01",
 				session_date: "2026-05-20",
 			});
 			makeCandidate(dir, "alice", {
 				candidate: "Alice",
-				date: "2026-05-01",
+				date: "2026-06-30",
 				session_date: "2026-05-01",
 			});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts`
around lines 156 - 173, The test "orders chronologically using session_date when
available" currently uses identical date and session_date values so it doesn't
prove session_date precedence; update the test data created via makeCandidate
for "carol" and "alice" so their date and session_date intentionally conflict
(e.g., set Carol.date before Alice.date but Carol.session_date after
Alice.session_date) so renderCohortSummary(..., {order: "chronological"}) must
rely on session_date to produce Alice before Carol; keep using loadCohort and
the same expectation on body.indexOf.

132-135: ⚡ Quick win

Body-wide negative date assertion is brittle.

expect(body).not.toContain("2026-05-10") can fail if that value appears elsewhere (metadata/footer) even when the Interviewed column is correct. Scope the check to Alice’s row.

Suggested assertion narrowing
 			const body = renderCohortSummary("senior-backend", loadCohort(dir));
-			expect(body).toContain("2026-06-01");
-			// The submission date should NOT appear in place of the session date
-			expect(body).not.toContain("2026-05-10");
+			const aliceRow = body.split("\n").find((line) => line.includes("Alice")) ?? "";
+			expect(aliceRow).toContain("2026-06-01");
+			expect(aliceRow).not.toContain("2026-05-10");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts`
around lines 132 - 135, The negative date assertion is too broad: instead of
using expect(body).not.toContain("2026-05-10"), narrow the check to Alice's
table row by locating the row that contains "Alice" (e.g., parse the HTML in
body and select the tr or nearest container with text "Alice") and assert that
that specific row does not contain "2026-05-10"; update the test to query the
Alice row from body and replace the global not.toContain check with an assertion
against that row's text content.

207-210: ⚡ Quick win

The decimal-number assertion can create false failures.

expect(body).not.toMatch(/\d+\.\d+/) is broader than the requirement and could fail on unrelated decimals. Header assertions already validate the column contract well.

Suggested cleanup
 			expect(body).not.toMatch(/\|\s*Score\s*\|/i);
 			expect(body).not.toMatch(/\|\s*Total\s*\|/i);
 			expect(body).not.toMatch(/\|\s*Rank\s*\|/i);
-			expect(body).not.toMatch(/\d+\.\d+/); // no decimal numbers (scores)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts`
around lines 207 - 210, The test contains an overly broad negative assertion
expect(body).not.toMatch(/\d+\.\d+/) which can fail on unrelated decimals;
remove that assertion from cohort-summary-edge-cases.spec.ts (the
expect(body).not.toMatch(/\d+\.\d+/) line) or, if you want to keep a
decimal-specific check, replace it with a narrow pattern that matches decimal
values inside table cells (e.g., a cell-boundary regex like
/\|\s*\d+\.\d+\s*\|/) so only score-like decimals are asserted against; keep the
existing header assertions as they already validate the column contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/services/interview/bootstrap/project-generator-security.spec.ts`:
- Around line 109-125: The test expects generateProject(role({ outputDir: dir
}), clientFor(malicious)) to reject on paths containing a NUL byte, so add an
explicit NUL-byte validation in the path-resolution logic inside generateProject
(the code path that processes GeneratedProject.files -> file.path) to detect
"\0" (or char code 0) and throw a specific, named error (e.g., InvalidPathError
or a new PathValidationError) before any FS operations; then update the test to
assert that generateProject rejects with that specific error type/message
instead of a generic toThrow(), referencing generateProject, GeneratedProject,
and the file.path handling code for location.

In `@tests/unit/services/interview/bootstrap/role-config-edge-cases.spec.ts`:
- Around line 199-213: The test "stores optional jdPath and customPrompt when
present" only sets and asserts jdPath but not customPrompt; update the test to
actually include a customPrompt in the RoleConfig passed to writeRoleConfig and
add an assertion that readRoleConfig(dir)?.customPrompt equals that value (or
alternatively rename the test to only mention jdPath). Locate the test block in
role-config-edge-cases.spec.ts and modify the cfg object creation (RoleConfig)
to include customPrompt and add an expect(...) assertion after reading via
readRoleConfig(dir) to verify persistence, or change the test description to
match the current assertions.

---

Nitpick comments:
In `@tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts`:
- Around line 156-173: The test "orders chronologically using session_date when
available" currently uses identical date and session_date values so it doesn't
prove session_date precedence; update the test data created via makeCandidate
for "carol" and "alice" so their date and session_date intentionally conflict
(e.g., set Carol.date before Alice.date but Carol.session_date after
Alice.session_date) so renderCohortSummary(..., {order: "chronological"}) must
rely on session_date to produce Alice before Carol; keep using loadCohort and
the same expectation on body.indexOf.
- Around line 132-135: The negative date assertion is too broad: instead of
using expect(body).not.toContain("2026-05-10"), narrow the check to Alice's
table row by locating the row that contains "Alice" (e.g., parse the HTML in
body and select the tr or nearest container with text "Alice") and assert that
that specific row does not contain "2026-05-10"; update the test to query the
Alice row from body and replace the global not.toContain check with an assertion
against that row's text content.
- Around line 207-210: The test contains an overly broad negative assertion
expect(body).not.toMatch(/\d+\.\d+/) which can fail on unrelated decimals;
remove that assertion from cohort-summary-edge-cases.spec.ts (the
expect(body).not.toMatch(/\d+\.\d+/) line) or, if you want to keep a
decimal-specific check, replace it with a narrow pattern that matches decimal
values inside table cells (e.g., a cell-boundary regex like
/\|\s*\d+\.\d+\s*\|/) so only score-like decimals are asserted against; keep the
existing header assertions as they already validate the column contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 04c43fb4-79c1-440c-a31f-06f8b7db413e

📥 Commits

Reviewing files that changed from the base of the PR and between 790b9e8 and cf8229b.

📒 Files selected for processing (7)
  • src/cli/index.ts
  • tests/unit/services/interview/bootstrap/openai-generator-client.spec.ts
  • tests/unit/services/interview/bootstrap/project-generator-security.spec.ts
  • tests/unit/services/interview/bootstrap/role-config-edge-cases.spec.ts
  • tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts
  • tests/unit/services/interview/review/asciinema-edge-cases.spec.ts
  • tests/unit/services/interview/review/audit-writer-edge-cases.spec.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cli/index.ts

Comment thread tests/unit/services/interview/bootstrap/role-config-edge-cases.spec.ts Outdated
asabaylus and others added 4 commits May 10, 2026 23:23
CodeRabbit-authored test suite verified that LLM-supplied paths
containing "\0" can otherwise truncate path checks in some syscalls
(the classic "evil.png\0.sh" trick). Reject the byte outright in
resolveWithinRoot before any further path math.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The interview bootstrap wizard previously ran as a sequential loop of
huh.NewForm(...).Run() calls. That produced a bare form on a blank
screen with no orientation — no //// TEAM HERO header, no right-side
configuration summary, no navigation hints. It didn't look like part of
the same product as the report wizard.

This commit rebuilds it as a single bubbletea program
(interviewBootstrapTeaModel) so it picks up the same shell-header +
summary-panel + hints layout as wizard.go. The data container
(bootstrapWizardModel) and per-screen validators are unchanged; only
the runner is swapped:

  runHuhBootstrapWizard → runBootstrapTeaWizard → tea.NewProgram(...).Run()

Layout:
- renderShellHeader at the top — same `//// TEAM HERO` gradient slashes
- Left ~60% column: the active step's huh form
- Right ~40% column: a bordered "Interview Bootstrap" summary panel
  listing every field. Filled values render bright, the active step's
  label is highlighted cyan, and not-yet-reached fields show "—"
- Footer hint line: `enter continue • ctrl+c quit`

Testing:
- Adds teatest + x/exp/golden as direct Go test deps (the latter was
  already transitively present)
- New layout tests in tui/interview_bootstrap_tea_test.go drive the
  model through teatest and assert the three layout invariants
  (header / summary / hints) are present at multiple wizard states
- TestInterviewBootstrap_Screenshot_WritesGolden produces three
  human-reviewable plaintext screenshots under
  tui/testdata/interview_bootstrap/ — the test compares against them
  on every run, so layout regressions show up as diffs in PRs. Run
  with TEAMHERO_UPDATE_SCREENSHOTS=1 to refresh after intentional
  layout changes
- Three pre-existing launcher-smoke tests
  (TestBootstrapWizard_HuhLauncher*Smoke) were rewired to stub the
  new program-runner seam (runBootstrapTeaProgram) instead of the old
  huhFormRun, keeping them fast and goroutine-leak-free
- TestRunInterview_BootstrapVerb_RequiresFlags now pins isStdinTTY to
  false so the test exercises the "no TTY → reject" branch
  deterministically (without the pin, an interactive `go test` session
  inherits a real TTY and the wizard hangs waiting for input)
- Broadened the package-shared ansiEscapes regex to also strip
  DEC-private mode sequences (`\x1b[?25l`, `\x1b[?2004h`) so the
  golden screenshots stay clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two adversarial review passes identified ~15 real bugs across the TUI
refactor and the recent CodeRabbit-driven service-layer fixes. This
commit lands the meaningful ones; the rest were design choices,
nitpicks, or speculative concerns.

## TUI critical fixes

- **Production wizard wiring**: runBootstrapTeaWizard previously called
  `tea.NewProgram(model, tea.WithInput(nil), tea.WithOutput(nil))` — the
  wizard would have hung the moment any user pressed a key, because
  bubbletea had no stdin. Use default stdio + WithAltScreen() like the
  report wizard does. The smoke tests didn't catch this because they
  stubbed out runBootstrapTeaProgram entirely.
- **Smoke tests were tautologies**: stubTeaProgramRunner just set
  `confirmed = true` and returned a synthetic result. The state machine
  could be broken and the tests would still pass. The stub now walks
  the model through real advance()/nextStep() transitions in-process,
  filling values that validators require, so a regression in the rubric
  branch or time-box sub-step actually fails a test.
- **Time-box "custom" branch**: previously swapped the form in place
  without advancing m.step, so highWater stayed at ibStepTimeBox and the
  summary panel claimed the user was still on the time-box screen while
  the custom-minutes input rendered. Introduced an explicit
  ibStepTimeBoxCustom step the state machine routes through.
- **Confirm "Cancel" produced silent success**: clicking Cancel set
  confirmed=false but the dispatcher returned 0 with no message,
  indistinguishable from a successful no-op. Now prints
  "Wizard cancelled at confirm" / "Wizard aborted" to stderr so users
  see why no role was generated.
- **WindowSizeMsg never reached the huh form**: on resize we updated
  width and returned, but didn't forward the message into the form
  itself. Internal field viewports were stale until the next keystroke.
  Now forward the resize message after updating width; this also makes
  huh.Confirm render its button row inline (visible in the refreshed
  confirm-step screenshot).
- **truncate() sliced bytes, not runes**: summary-panel labels
  containing multi-byte characters (Étienne, CJK, etc.) would render as
  invalid UTF-8 when truncated mid-codepoint. Switched to []rune slicing
  and rejected non-positive max explicitly.
- **Deleted ~200 lines of dead code**: stepRole through stepConfirm and
  stepConditionalRubric in interview_bootstrap_form.go were unreachable
  after the runner swap. stubHuhFormRunner in the test file was also
  unused. Both removed.

## Service-layer critical fixes

- **assertSafeToClear was too permissive**: a misconfigured
  outputDir of "/tmp", "/var", "/usr", "/home", etc. previously passed
  and clearOutputDir would recursively delete the entire system
  directory. Now reject a DANGEROUS_ROOTS set explicitly, plus
  single-segment absolute paths. Subdirectories of /tmp (mkdtemp paths)
  are still permitted, so tests work normally.
- **Path traversal via symlinks**: resolveWithinRoot did string-level
  containment math but never realpath'd the parent chain. If a previous
  generation attempt left a symlink like `subdir -> /etc`, writing
  `subdir/passwd` would land outside the workspace. Now walk intermediate
  segments with lstatSync and refuse any existing symlink in the path.
- **YAML escapes dropped most C0 controls**: audit-writer's
  escapeYamlDoubleQuoted handled `\\ " \n \r \t` but let other control
  characters (\x00–\x1F, \x7F) through verbatim. Per YAML 1.2 §5.7,
  unescaped controls in a double-quoted scalar are a parse error in
  strict parsers (js-yaml FAILSAFE, libyaml). A candidate name with a
  stray \x01 from copy-paste would silently corrupt audit.json
  round-trip. Now emit `\xHH` for any C0 control character.
- **test-pass spawnSync timeout didn't kill child group**: Node's
  timeout signals only the direct child (`go` or `bun`), leaving
  sub-binaries reparented to PID 1 and still running on the grader
  machine. Wrapped the spawn in coreutils `timeout --kill-after=10s 300s`
  which signals the whole process group; fall back to plain spawnSync
  with SIGKILL when coreutils isn't available.

## Hygiene

- **discrepancy.spec.ts afterAll(mock.restore)**: CLAUDE.md explicitly
  forbids this — mock.restore() undoes ALL mock.module() registrations,
  including the pinning I added at the top of the file. Per-file
  isolation via scripts/run-tests.sh is the project's mode; the pinning
  is defensive against bun test's non-isolated path. afterAll restore
  was actively counter-productive.
- **git-history hardcodes argv and adds `--`**: the runner already
  builds argv from constants, but added the literal "--" terminator
  before any future caller can inject user-controlled refs/pathspecs
  that begin with "-" and would be parsed as git options.
- **cohort-summary signed_off type validation + pipe escaping**:
  loadCohort now requires signed_off to be a real boolean (not a string
  "false" that JS treats as truthy). renderRow escapes `|` and newlines
  in candidate / recommendation fields so a name like "Alice | aka Bob"
  doesn't break the markdown table layout.
- **transcript regex accepts Unicode names**: switched from
  `[A-Za-z][\w .'-]*?` to `\p{L}[\p{L}\p{N} .'_-]*?` with the /u flag so
  "Étienne: foo" parses correctly. The previous regex silently dropped
  non-ASCII speaker labels.

## What I chose NOT to fix
- isStdinTTY mutable package var (no t.Parallel today; defer)
- WaitFinished timeout deferral risk (small window; defer)
- jsonl-log existsSync race (benign — try/catch already wraps the read)
- verification.ts adjacent-only prompt window (design choice; documented)
- start.sh exec failure leaves started-at (would need a wrapper script)
- CLI nested-subcommand denylist won't catch new TUI subcommands
  (need a Go-TUI-side change to reject unknown positionals)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-reported bug:
- TUI bootstrap wizard exited with "Wizard cancelled at confirm" even
  after the user clicked through all 13 screens. huh.Confirm uses the
  initial value of the bound variable to decide which button has focus,
  and our `confirmed bool` defaulted to false (zero value), so the
  wizard opened the confirm screen with focus on "Cancel". A stray
  Enter cancelled the whole run. Now pre-set `d.confirmed = true`
  before building the confirm form so the affirmative button has focus
  by default — matches typical "Are you sure? [Y/n]" UX.

CodeRabbit threads (fixed):
- scripts/run-interview-review.ts: also reject when both --repo AND
  --local-repo-path are supplied (was previously underspecified —
  required at least one, but accepted both ambiguously).
- src/services/interview/review/ai-observer.ts: rejectIfScored now
  walks the response object tree and inspects actual keys instead of
  regex-scanning the serialized JSON. An evidence excerpt containing
  literal text like `"score": 7/10` (e.g. a code comment) no longer
  false-positives.
- skills/teamhero-interview/SKILL.md: review verb heading now matches
  the actual invocation form (`--candidate <name> --repo <url>`) with
  a note that the repo can also be positional.
- tests/unit/services/interview/bootstrap/role-config-edge-cases.spec.ts:
  renamed the misleading "stores optional jdPath and customPrompt"
  test to "stores optional jdPath", and added a sibling test for
  customPrompt — the original test name didn't match its assertion.
- src/services/interview/review/collectors/asciinema.ts: per-char
  iteration of evt.data so multi-character input chunks (paste, rapid
  typing) reconstruct commands correctly. CSI escape sequences
  (ESC `[` ... letter) are detected and skipped as a unit so arrow
  keys aren't accidentally buffered character-by-character.
- src/services/interview/review/collectors/git-history.ts: catch block
  now distinguishes benign "no commits yet" from genuine failures
  (binary missing, permission denied), writing the latter to stderr
  instead of fully swallowing the error.

CodeRabbit threads (not fixed, with rationale):
- ai-observer/cohort-summary test hardening suggestions are deferred —
  current assertions cover the load-bearing invariants; the suggested
  pinning would couple tests to presentation details.
- "Move port interfaces to core/types.ts" architectural suggestions
  apply only to ports consumed across module boundaries; the interview
  module's `Cloner`, `ReviewDependencies`, and `GeneratorClient` are
  internal to their namespaces.
- Markdownlint MD040 nits on planning doc and SKILL.md are cosmetic.

All 1749 TS tests pass; full Go suite green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
asabaylus and others added 19 commits May 13, 2026 00:09
…+ generation

User-reported bug: "Idea generation failed: context deadline exceeded
(Client.Timeout exceeded while awaiting headers)" in the bootstrap
wizard when the user picked "Suggest ideas".

Root cause: the openAIIdeaFetcher reused the package-wide
defaultHTTPClient (5s timeout), which is tuned for cheap doctor/setup
auth-probe calls. The OpenAI Responses API generating 5 ideas under a
strict json_schema routinely takes 15-45s — the client timed out before
headers came back.

Changes:
- tui/interview_bootstrap_ideas.go: give the idea fetcher its own
  http.Client with a 90s timeout sized for content generation. Tests
  inject client directly, so test paths are unaffected.
- src/services/interview/bootstrap/openai-generator-client.ts: honor
  AI_MODEL from env (matches the Go side and the rest of the TS
  codebase). Keeps gpt-5-mini as the default when nothing is set.
- scripts/run-interview-bootstrap.ts: add --model and --max-attempts
  flags so we can override per-invocation without editing code (needed
  for tuning generation quality on principal-tier roles).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pts-hto)

Bootstrap was failing validation with "35 LOC; expected 400-700" on a
Principal Full Stack role whose generated project contained 447 lines of
C# + Vue + TS. Root cause: project-validator.ts hardcoded SOURCE_EXTS to
{.ts, .tsx, .js, .jsx, .go} — the team's *own* stack — so .cs files
weren't counted at all. TEST_NAME_PATTERN had the same problem: it only
recognised foo.spec.ts / foo.test.ts naming, so xUnit *Tests.cs files
weren't classified as tests, and the failing-test detection couldn't
find the [Fact(Skip = ...)] marker the generator correctly emitted.

The interview command takes role.stack as input from the hiring manager
— the validator must not assume that stack matches the CLI's own stack.

Changes:
- SOURCE_EXTS expanded to cover JS/TS family (+ .vue, .svelte), JVM
  (.java, .kt, .scala), .NET (.cs, .fs, .vb), systems (.rs, .c, .cpp,
  .h), dynamic (.py, .rb, .php), Apple (.swift), and functional niches
  (.ex, .clj, .ml).
- TEST_NAME_PATTERN extended to recognise *Tests.cs / *Specs.kt /
  foo_test.go / test_foo.py / foo_spec.rb in addition to the JS/TS
  *.spec.* / *.test.* convention.
- Skipped-test detector extended with xUnit/NUnit [Fact(Skip=)] /
  [Ignore], JUnit @disabled / @ignore, pytest / unittest @Skip, RSpec
  pending, Rust #[ignore], and a universal NotImplementedException /
  raise NotImplementedError catchall.
- Two new regression tests: one for a C# project that lands in window
  via .cs + *Tests.cs + [Fact(Skip=)], one for a Python+Vue project
  that lands via .vue + .py + @pytest.mark.skip.

Verified end-to-end: gpt-5 first-attempt success on the
principal-full-stack role generating a 447-LOC C#/.NET + Nuxt project
with idiomatic xUnit skip marker — previously failed all 3 attempts
because the validator couldn't see the files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… link

The interactive wizard already showed an OSC 8 hyperlink to the
generated repo and offered a GitHub publish form after success
(tui/interview_bootstrap_publish.go + the result screen in
interview_bootstrap_generate.go). The headless path
(`teamhero interview bootstrap --headless ...`), used by scripts and
proctors invoking the CLI directly, dropped the user at the bun
script's plain "Output: <dir>" line with no link and no publish
opportunity — so a proctor running the supported headless flow had to
copy-paste the path, then drive `gh repo create` + `git push` by hand
every time.

Wires `runInterviewBootstrap` through the same post-success UX:

- On exit==0, print `Project: <abs-path>` wrapped in an OSC 8
  file:// hyperlink (ctrl-click opens the dir in Finder/Explorer).
- If stdin is a TTY and --no-confirm is absent, invoke the existing
  offerPublishToGitHub flow. CI runs (non-TTY) and `--no-confirm`
  scripted runs stay silent — no surprise prompts.
- offerPublishToGitHub already reads GITHUB_PERSONAL_ACCESS_TOKEN
  from ~/.config/teamhero/.env (the token `teamhero setup` saves),
  prints a "run teamhero setup" tip when it's missing rather than
  nagging, and uses huh forms for repo name / owner / visibility
  (private default). The wizard and headless paths now share that
  identical flow — no behavior drift.

Tests via the existing offerPublishToGitHub and isStdinTTY var seams:

- PrintsSuccessLinkOnZeroExit  — asserts the OSC 8 envelope on stdout.
- OffersPublishWhenTTY         — TTY + no --no-confirm -> publish ran.
- SkipsPublishWhenNoConfirm    — --no-confirm -> publish skipped.
- SkipsPublishWhenNotTTY       — non-TTY -> publish skipped.
- NoPublishOnFailure           — exit!=0 -> no link, no publish.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The interview verb picker was rendering with only "> Cancel" visible on
the first paint. The user had to press the up arrow before they could
see Bootstrap / Review / Cohort.

Root cause: the Cancel option's value was the empty string "". The
picker bound a `var verb string` (zero value: "") to huh.Select.
huh saw the bound value matched the Cancel option, marked Cancel as
the "current value" of the field, and anchored the cursor on it (the
LAST row). When the form rendered with a constrained viewport, it
scrolled so the cursor stayed visible, clipping every option above it.
Arrow-up moved the cursor and scrolled the viewport, revealing the
hidden options — which is exactly the workaround the user reported.

Fix: give Cancel a distinct sentinel value "cancel" so the bound
verb's zero value matches nothing in the list. huh then defaults the
cursor to the first option (Bootstrap) and all four rows paint on
entry. interviewVerbPicker maps the "cancel" sentinel back to the
caller's "" no-op convention before returning, so the dispatcher's
existing `if verb == ""` check still covers both ctrl-c abort and
explicit Cancel selection — no behavior change for downstream callers.

Tests:
- TestInterviewVerbOptions_NoValueMatchesZeroDefault — pins the bug
  by asserting no option has Value == "". Any future option whose
  value collides with `string`'s zero default would re-introduce the
  clipping, so the guard covers the whole list, not just Cancel.
- TestInterviewVerbOptions_IncludesCancel — guards against a refactor
  that drops the Cancel row entirely (would force users to commit to
  a verb with no way out other than ctrl-c).
- TestRunInterview_NoVerb_TTY_CancelSentinelTreatedAsNoOp — confirms
  the picker→dispatcher contract: Cancel still produces exit 0 with
  no error output, identical to ctrl-c abort.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…0-min time-box

Two PRD-tracked defaults now apply automatically when their flags are
omitted, so a typical headless run boils down to --role + --stack +
--domain + --feature + --mode-* with no path or duration to remember.
Explicit flags continue to win — the original PRD's per-invocation
override capability is intact.

Output directory:
- Default: ./interviews/<role-slug> (was: required flag, no default in
  the headless path; the wizard used to seed ./roles/role).
- .gitignore now lists both `interviews/` (new default) and `roles/`
  (kept for users upgrading from the old convention so their existing
  generated content stays gitignored).
- Wizard form (interview_bootstrap_tea.go, interview_bootstrap_wizard.go)
  seeds with `./interviews/role` and auto-substitutes the typed role
  slug on the output-dir step.

Time-box:
- Default: 60 minutes (was: 90).
- "60 minutes" is now the (recommended) label in the wizard's huh
  select; "90 minutes" loses the recommendation suffix. Standard
  options remain 60 / 90 / 120 / Custom.
- bun script (scripts/run-interview-bootstrap.ts) falls back to "60"
  when --time-box is omitted, matching the Go TUI's default.

Implementation:
- New applyBootstrapDefaults(opts) in interview_bootstrap.go runs
  between flag parsing and validation. nil-safe so callers don't need
  to guard. Empty role leaves OutputDir empty so the validator's
  "missing --role" error surfaces instead of producing a misleading
  ./interviews/<empty> path.

Tests (Go):
- FillsOutputDirFromRole / DoesNotOverrideExplicitOutputDir /
  LeavesOutputDirEmptyWhenRoleMissing / FillsTimeBoxWhenMissing /
  DoesNotOverrideExplicitTimeBox / NilSafe — unit coverage for
  applyBootstrapDefaults.
- AllowsOmittedOutputDirAndTimeBox — end-to-end through
  runInterviewBootstrap with both flags omitted; asserts the
  runner sees the resolved defaults.
- TestBootstrapWizard_DefaultModelHasSensibleDefaults updated to
  expect "60" (was "90").

Snapshot golden files (4 in tui/testdata/interview_bootstrap/)
regenerated via TEAMHERO_UPDATE_SCREENSHOTS=1 to reflect the new
seeds (TimeBox: "60", OutputDir: "./interviews/senior-backend").
Layout assertions still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The confirm step rendered every collected wizard field a second time as
the huh.Confirm form's Description — a long "role=… · stack=… ·
domain=… · time-box=… · project=… · analysis=… · rubric=… · out=…"
line — even though the right-hand summary panel already lists each
field in a structured, labelled format. Result: the only thing the
user actually has to act on (Yes / Cancel) was buried under repeated
information, and the reported screenshot showed multiple lines of
config metadata pushing the buttons below the fold on shorter
terminals.

Remove the redundant description from the confirm form. The right-
hand summary panel stays untouched as the canonical pre-generation
recap. The summarizeBootstrapModel helper is kept (it's still useful
for logging and the existing unit test) — it's just no longer wired
into the confirm form.

Tests:
- TestInterviewBootstrap_ConfirmStep_OmitsVerboseSummary — drives the
  wizard to ibStepConfirm and asserts the rendered view contains
  "Ready to bootstrap?" and "Yes, generate the role" but NOT the
  summary signature tokens "role=", "stack=", "time-box=", "out=".
  Any regression that pipes summarizeBootstrapModel back into the
  Description will reintroduce those tokens and fail this assertion.
- 03-confirm-step golden snapshots (ANSI + plaintext) regenerated via
  TEAMHERO_UPDATE_SCREENSHOTS=1 to lock in the cleaner layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…st pass

Real wizard runs were failing with the validator complaint:

  Expected at least 2 deep modules (>=80 lines); found 1.
  LOC out of range: 247 lines of code; expected 400-700.

Root cause: gpt-5-mini was consistently producing a thin happy-path
sketch on attempt 1 — one deep module, ~200-300 total LOC — and only
climbing into the 400-700 window on attempts 2-3 once the validator's
specific failure list was fed back through previousFailures. With a
3-attempt budget, marginal cases ran out of retries before recovery.

Two changes work together:

1. Stronger first-pass prompt (openai-generator-client.ts). The LOC
   bullet is now labeled "SIZE TARGET" with explicit framing as a
   "hard constraint", names ~525 as the target midpoint to leave
   variance margin, and provides per-module guidance ("100-150 lines
   per deep module"). It also names a typical decomposition (types
   module + service module + helpers) so the model has a concrete
   shape to aim at instead of guessing. The "Do NOT shrink to a
   minimal happy path" line addresses the observed failure mode
   directly.

2. Default attempt budget raised from 3 to 5
   (project-generator.ts). One additional retry was reliably enough
   to recover marginal cases in manual testing; bumping to 5 leaves
   safety margin without runaway cost (each attempt is one structured
   Responses API call).

Tests:
- openai-generator-client.spec.ts: new assertion that the prompt
  contains the SIZE TARGET / hard constraint / 100-150 lines per
  deep module phrasing. Any future prompt edit that loses the
  per-module sizing will re-introduce the first-attempt undersize bug
  and trip this regression guard.
- project-generator.spec.ts: existing 3-attempt assertion updated to
  reflect the 5-attempt default. The "succeeds within budget" test
  still locks in mid-budget recovery (succeeds on attempt 3 of 5);
  the "exhausts budget" test now expects attempts === 5 and is
  rewritten with a single-malformed clientReturning (clamps to last)
  so adding attempts in the future doesn't churn the fixture list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…reak

Reported: in the rubric-mode step (and similar select screens) the
left vertical bar (huh.ThemeCharm's focus-border decoration) renders
correctly on the title line but is missing on wrapped Description
lines, producing a visible break in the form's bounding rail.

Root cause is upstream: huh's Charm theme doesn't extend the focused-
field left bar onto Description content that wraps via huh's internal
wrap path. We can't fully fix it from our side until the Charm
libraries (huh / lipgloss / bubbletea) ship a fix.

Pragmatic mitigation for the static-description case: shorten the
Analysis-mode and Rubric-mode Descriptions so they fit on one line at
the default formWidth (3/5 of an 80-col terminal = 48 chars) and
never enter huh's wrap path. The full detail that used to live in
the Description is moved into the option labels themselves, which
render correctly even when long.

Limitations:
- The dynamic-content case still applies: the Feature-description
  textarea and Custom-rubric-prompt textarea wrap user-typed content
  via huh's path, and the bar-break recurs there. A separate change
  upgrading huh / lipgloss / bubbletea is needed to address that
  end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported requirement: the interview kit's scaffolding (start.sh,
end.sh, INTERVIEW_RULES.md, AGENTS.md, PRIVACY_RELEASE.md,
RUBRIC_OVERVIEW.md, .claude/CLAUDE.md, lib/privacy-gate.sh, etc.)
should always end up in the generated repo, regardless of whether
the proctor picked a generated starter project (Mode A) or a
brief-only flow (Mode B).

Existing gap: `--kit-dir` was an explicit flag with no default. If
the proctor invoked headless without passing it, the bootstrap
skipped the kit copy entirely (project-generator.ts only runs
copyDir when `kitTemplateDir` is supplied). The AI-generated project
landed in the output directory but none of the proctor/candidate
guidance shipped alongside it — the recording workflow that depends
on start.sh / end.sh / PRIVACY_RELEASE.md was effectively broken.

applyBootstrapDefaults now sets KitDir to "teamhero-interview-kit"
(resolved relative to the current working directory, same lookup as
findBootstrapScript in interview_bootstrap.go) whenever the flag is
omitted. Explicit --kit-dir still wins, so proctors with a custom
kit layout can override.

Tests:
- FillsKitDirSoScaffoldingAlwaysShipsToCandidate — bare opts (only
  Role set) get KitDir = "teamhero-interview-kit" after defaults
  run.
- DoesNotOverrideExplicitKitDir — explicit "/opt/custom-kit" wins
  through the defaulting pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ug logs

The wizard had three redundant ways to describe the project: the Feature
text input, a late-stage "How should the AI prompt be supplied?" select,
and an optional "Project prompt (optional)" addendum. The user pointed
out — correctly — that the Feature description IS the project prompt,
so the late-stage steps were just asking the proctor to say the same
thing twice.

The new flow has one either/or decision right after Domain:
  "I'll write the description myself" → existing Feature text input
  "Suggest project ideas for me"      → spinner → pick-one
Both paths populate data.feature directly; the projectPrompt field,
--project-prompt flag, and corresponding OpenAI addendum block are all
gone.

Added structured run logging so future generation failures are
triageable without a rerun. The dispatcher prints one line of run
context to stderr (role, mode, stack, domain, time-box, rubric, output,
debug), and the bun subprocess emits bootstrap.start / bootstrap.ok /
bootstrap.fail records with elapsed-ms and attempt counts. The --debug
flag adds truncated feature/custom-prompt bodies and output/kit paths.

Strengthened the OpenAI prompt to push the model past its undersize
bias. The size requirements now lead the Mode A spec, name 4-5 concrete
files with line ranges, and on retry the prompt quotes the prior LOC
count and demands "roughly DOUBLE the previous output" — gpt-5-mini was
otherwise content to produce another 266-LOC sketch even after the
validator flagged it. Smoke tests show first-attempt success at 399 LOC
and three-attempt recovery at 490 LOC with no human intervention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ptions

The wizard's "Project mode" picker exposed two opaque codes (A/B). Replace
with three self-describing options that match how the proctor actually
thinks about the interview shape:

  Brownfield                            — AI scaffolds a starter codebase
  Greenfield (use the stack above)      — written brief; candidate uses your stack
  Greenfield (candidate picks stack)    — written brief; candidate also picks tooling

The first two collapse to the existing mode A/B at the BootstrapOptions
boundary. The third is a new variant: mode B + a stackByCandidate flag
that flips the BRIEF.md's "Tech stack" section from a hard requirement
("REQUIRES the candidate to use Go") to a contextual hint
("candidate selects their own tech stack ... Go is referenced as the
team's internal tooling"). When stackByCandidate is set, the stack
choice itself becomes part of what's being evaluated.

Renamed "Rubric mode" to "How should AI share observations?" so the
question signals what the answer actually controls (the rubric the AI
uses when writing up the recorded interview). Pure label change — same
three options, same internal values.

Every wizard question now carries a one-line description telling the
hiring manager how their answer shapes the generated project: stack
drives the source language, domain shapes glossary vocabulary, feature
description becomes the project's central focus, etc.

Headless protocol: new --stack-by-candidate flag, valid only with
--mode-project B. Mode A + stack-by-candidate is rejected by both the
Go validator and TS role-config validator because "candidate picks the
stack" is incoherent when the AI is generating a starter codebase IN a
specific stack.

Backward compat: legacy A/B values in BootstrapWizardDefaults still
seed the wizard correctly (normalizeWizardProjectMode maps them) so
saved role-config.json files don't break.

Tests added:
- Project type taxonomy → (mode, stackByCandidate) resolver (Go)
- Legacy A/B value normalization (Go)
- Greenfield-open round-trip through bootstrapWizardOptionsFromModel (Go)
- --stack-by-candidate flag parsing (Go)
- Validator rejection of stack-by-candidate + mode A (Go + TS)
- OpenAI prompt branches: B alone vs B + stackByCandidate (TS)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ment

The validator's 400-700 LOC and 2-deep-module rules were a heuristic
inherited from the initial slice — they weren't in any product spec.
They were producing real friction: perfectly serviceable 300-LOC
projects were being rejected and retried, and the AI prompt was
laden with size enforcement language that crowded out the actual
requirements.

Removed:
- MODE_A_LOC_MIN / MODE_A_LOC_MAX / DEEP_MODULE_MIN_LINES /
  MODE_A_MIN_DEEP_MODULES constants and the validator branches that
  used them.
- countLines() helper, SOURCE_EXTS set, extname import — all dead now.
- ABSOLUTE SIZE REQUIREMENTS block and the per-file budget table from
  the OpenAI Mode A prompt.
- extractLocFromFailure / extractDeepModuleCount retry-feedback
  helpers and the "CORRECTION REQUIRED — DOUBLE the previous output"
  retry directive (no failure type to extract from now).

Kept:
- README.md, GLOSSARY.md, and at-least-one-failing-test structural
  checks. These are real product requirements and shape what the
  candidate sees on minute zero.
- Polyglot test-file detection (xUnit Skip, pytest.mark.skip, t.Skip,
  etc.) — the structural check still needs it.
- Retry loop, now reverted from 5 attempts back to 3 since structural
  failures are rare and self-correcting.

Tests:
- Removed the three size-rejection tests in project-validator.spec.ts.
- Added a regression guard that pins the new contract: a tiny but
  structurally-complete project passes validation.
- Updated openai-generator-client.spec.ts: dropped the "encodes hard
  LOC band" test; added a "does NOT encode hard LOC band" regression
  guard so a future prompt edit can't silently reintroduce the rule
  without the matching validator coming back too.
- Simplified the stubModeAProject() fixture — no more LOC padding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…candidate output

The candidate-facing repo was leaking hints about the answer:

- GLOSSARY.md listed domain concepts the candidate is supposed to
  identify themselves as part of the evaluation.
- Pre-existing failing/skipped tests (e.g. describe.skip("addUser",
  ...)) gave away the API surface and function names the candidate
  should design.
- The kit-overlaid .claude/CLAUDE.md literally told the candidate's
  agent to "read GLOSSARY.md and failing tests first" — coaching
  about the structure of the work before the candidate even read it.

All three are now removed from what ships to the candidate:

- project-validator no longer requires GLOSSARY.md or a failing test.
  README.md is the only required structural file (the candidate-facing
  brief — they need that to know what they're building).
- OpenAI prompt now explicitly forbids generating test files,
  GLOSSARY.md, CLAUDE.md, or AGENTS.md under a new "DO NOT GENERATE"
  block. The README's "Getting started" section instructs the candidate
  to write their own tests.
- teamhero-interview-kit/.claude/CLAUDE.md deleted. The kit still
  ships .claude/settings.json because that's the hook config that
  writes interview.log for the AI observer (non-optional plumbing).

Verified end-to-end: a fresh smoke run produces only README.md +
source files on the AI-generated side, plus the proctor-facing kit
overlay (AGENTS.md, INTERVIEW_RULES.md, PRIVACY_RELEASE.md,
RUBRIC_OVERVIEW.md, start.sh/end.sh). No tests/, no GLOSSARY.md, no
CLAUDE.md.

Tests updated:
- project-validator.spec.ts: dropped failing-test and GLOSSARY required
  checks. Added regression guards pinning the new contract — a project
  with only a README.md must pass.
- openai-generator-client.spec.ts: added assertions that the prompt
  forbids generating test files and GLOSSARY.md. Existing CLAUDE.md/
  AGENTS.md regex updated for the new multi-line "DO NOT GENERATE"
  block shape.
- project-generator.spec.ts: simplified stubModeAProject() — no
  GLOSSARY, no sample test file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… toggle

The JD was previously tied to the "default+jd" rubric value — collected
only as a side-effect of picking that rubric, and used only by the AI
observer. Two problems with that:

1. The proctor couldn't combine a custom rubric WITH a JD. The rubric
   choice and the JD attachment are independent decisions; coupling
   them forced a false trade-off.
2. The JD was never read by the project-generation prompt. A junior
   healthtech JD should be able to nudge the AI toward an
   EHR-flavoured feature; instead it was inert until post-interview
   review.

This change splits the JD into three standalone wizard steps right
after Domain:

  "Will you provide a job description?"  (yes/no)
  ↓ if yes
  "Path to job description file"
  ↓
  "Should the JD influence the project being generated?"  (yes/no)

The rubric question simplifies to default/custom — the "default+jd"
value is retired (validator rejects it with a clear error so stale
configs surface).

Behavior changes:
- ai-observer reads the JD whenever jdPath is set, regardless of
  rubric mode. The custom rubric and the JD can now coexist; both get
  appended to the observer's instructions.
- openai-generator-client now reads the JD body and injects it as a
  "Job description context" block in the project-generation prompt
  when jdInfluencesProject is true. The model uses it to calibrate
  complexity and domain. When the flag is off (or no JD), the prompt
  is unchanged.
- New headless flag --jd-influences-project; rejected without --jd-path.
- BootstrapOptions and RoleConfig grow a JDInfluencesProject /
  jdInfluencesProject field. Validator rejects influence=true without
  a path on both sides.

Smoke verified: a JD that mentions FHIR/HL7/EHR concepts produced a
generated repo with patient-data domain types and a dashboard-summary
service. Without --jd-influences-project, the same role config
produced a generic dashboard.

PRD updated (docs/2026-05-09-candidate-interview-reviewer-plan.md) with
a new "v2 — post-shipping refinements" section that catalogs all the
changes shipped in this run (wizard rendering fixes, sensible defaults,
kit-completeness, headless UX, --debug logging, wizard simplification,
question descriptions, greenfield/brownfield taxonomy, size validator
removal, candidate-facing strip of GLOSSARY/sample-tests/CLAUDE.md, and
this JD refactor) — so the feature's evolution is visible without
spelunking git log.

Tests:
- Go: new state-machine tests for jd-provided yes/no branch,
  jd-path → jd-influences-project transition, and rubric default →
  output-dir (no longer detours via jd-path). Retired
  TestBootstrapWizard_NextState_JDRubricRoutesToFilePicker and
  TestBootstrapWizard_NextState_JDPathThenOutputDir.
- Go: validator tests for retired "default+jd" rejection, standalone
  jd-path acceptance, and jd-influences-project-without-path
  rejection.
- TS: new openai-generator-client tests covering JD injection when
  jdInfluencesProject is true AND omission when false. Retired the
  "default+jd" coupling tests in role-config.spec.ts; added pairwise
  tests for standalone jd-path and jd-influences-project validation.
- TS: ai-observer test renamed to assert JD is included regardless of
  rubric mode; new test that custom rubric + JD combine in the
  observer's instructions.
- Snapshots regenerated for the new summary panel rows ("JD attached"
  and "JD usage").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
consola's default level is 3 (info); .debug() output is hidden. When the
proctor passed --debug the headline log line printed (consola.info) but
the bootstrap.debug.feature / output-dir / kit-dir lines were swallowed,
which was exactly the troubleshooting context --debug exists to surface.

Lift consola.level to 4 inside the --debug branch so all the existing
consola.debug calls below print without re-checking the flag at each
site. One-line fix, verified against a Stratus principal-engineer JD
smoke run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…raming

Two pieces of UX feedback in one commit because they intersect in the
same step ordering.

(1) The JD describes the business domain — so asking for Domain BEFORE
the JD made the proctor either type the domain twice or type something
generic that contradicted the JD. The wizard's step order is now:

  Stack → JD provided? → [JD path] → [JD influences project?] → Domain (skipped if JD) → Feature source → ...

With a JD attached, the Domain step is skipped entirely; the role
config carries an empty domain field and the OpenAI prompt renders a
"Domain: infer from the job description context below" instruction
instead of a bare "Domain: .". Validators on both Go and TS sides
accept an empty domain when jdPath is set. The bun script's
pre-validator picked up the same rule.

(2) The default rubric option label said "9 built-in dimensions" —
opaque framing for the hiring manager. The 9 dimensions are *what*
the rubric observes, not *why* it observes them. The new label and
description say what's actually being looked for: sound engineering
practices (domain-driven design, deep modules, verification
discipline) applied to AI-assisted work.

Also fixed: the analysis-mode step said "drafts post-interview notes"
— corrected to "post-interview observations" to align with the
project's observation-over-judgment ethical framing.

Tests:
- Go state-machine: new tests for Stack→JD-provided, JD-influence→
  feature-source (skipping Domain), JD-not-provided→Domain, and
  Domain→feature-source. Stale Domain→JD-provided test retired.
- Go validator: new pairwise tests for domain-optional-with-JD AND
  domain-required-without-JD.
- TS role-config validator: same pairwise coverage at the role-config
  layer (empty domain + JD ok; empty domain + no JD rejected).
- TS openai-generator-client: new tests pinning the prompt rendering
  for both branches — "Domain: <X>." with a proctor-supplied domain
  and "Domain: infer from the job description" when domain is empty.
- Summary panel: JD rows now precede Domain (mirroring the wizard
  flow); Domain row shows "(from JD)" when the proctor took the
  JD branch.
- Snapshots regenerated.

Smoke verified: a Stratus Principal Full Stack JD attached with NO
--domain produced "Agentic Work Order Planner" — a work-order review
workflow with tool-call loop and human-in-the-loop checkpoint, domain
visibly inferred from the JD without the proctor typing it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…st cut 60x

Six issues from a proctor's manual run of the interactive wizard:

1) Mode B wizard run shipped only BRIEF.md + role-config.json. Cause:
   the wizard path never called applyBootstrapDefaults, so KitDir was
   empty and the bun subprocess copied zero kit templates. Fix calls
   defaults right after the wizard returns. The kit overlay
   (INTERVIEW_RULES.md, AGENTS.md, PRIVACY_RELEASE.md,
   RUBRIC_OVERVIEW.md, start.sh / end.sh, .claude/settings.json) now
   ships on every run regardless of dispatch path.

2) Mode B output had no index.html for the candidate to open. The
   orchestrator now writes a minimal static index.html for Mode B
   runs that links to BRIEF.md and says "you choose the stack" — a
   landing pad without prescribing tooling. Skipped for Mode A
   (which already has source files) and skipped when the AI's own
   output happens to include an index.html (don't clobber).

3) A single Mode B run cost $1.36 (143k tokens). Two changes brought
   it to ~$0.02 (2.3k tokens) — a 60x reduction:
   - reasoning effort dropped from "medium" (default) to "low". gpt-5
     models burn 100k+ reasoning tokens on a structured-output task
     at medium; low produces equivalent scaffolds for this work.
     Override via AI_REASONING_EFFORT env var.
   - Removed the full 9-dimension rubric block from the
     project-generation prompt. The rubric is review-side context;
     the generator only needs to know to leave decision points for
     the candidate. ai-observer.ts still gets the full rubric where
     it actually scores. Saves ~600 input tokens per call.

4) TUI didn't show which LLM was being used. The generate screen
   now displays "Model: <name>" alongside the spinner so the proctor
   sees what's about to be billed before they commit. Each Responses
   API call logs an openai.usage line (input / output / reasoning /
   total tokens) so cost is visible per attempt.

5) Success-link display showed the project path from the filesystem
   root. printBootstrapSuccessLink now uses filepath.Rel(cwd, abs)
   for the label — running teamhero from ~/Documents and writing
   into ./interviews/foo shows as "interviews/foo" rather than the
   full absolute path. The OSC 8 hyperlink URL is still absolute so
   the click actually opens the right place. Falls back to absolute
   when the output is outside cwd (Rel returns ".." in that case).

6) The `teamhero interview` no-args verb picker rendered as a bare
   huh.Select — no shell header, no hints footer, breaking visual
   parity with the rest of the app. The picker now prints the
   shared "//// TEAM HERO" header above and the
   "enter continue • ctrl+c quit" hints below, matching wizard.go.

Tests:
- Go: wizard-path-applies-KitDir-default (the original bug).
- Go: relative-path display (label is cwd-relative, OSC 8 URL is
  absolute).
- TS: Mode B orchestrator writes index.html stub; Mode A doesn't;
  AI-authored index.html is preserved.
- TS: prompt no longer inlines the rubric; replacement one-liner
  pinned by content. Token-cost regression guard.

Smoke verified: a Stratus principal-engineer Mode B run with JD
influence used 2,315 total tokens (was 143,697 before this change)
and produced AGENTS.md, INTERVIEW_RULES.md, PRIVACY_RELEASE.md,
RUBRIC_OVERVIEW.md, start.sh, end.sh, lib/privacy-gate.sh,
.claude/settings.json, BRIEF.md, role-config.json, and index.html
in /tmp/stratus-v3 on attempt 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two orthogonal flags so another agent can drive the bootstrap
and consume the result programmatically:

  --json     Emit a JSON payload to stdout describing the run.
             Human-readable output (Project: link, bun subprocess
             logs, publish messages) is routed to stderr so stdout
             stays parseable. The orchestrating agent reads stdout
             and acts on the payload — schedule interviews, send
             HR notifications, etc.

  --publish  Auto-publish the generated repo to GitHub. No prompt.
             If a GitHub token is configured (via `teamhero setup`),
             the repo is created and the tree pushed. With --json,
             the resulting URL lands in payload.github.url so the
             downstream agent can put it directly in an email.

Either flag can be true independently; both can be true together.

Payload schema (versioned at "teamhero.interview.bootstrap/v1"):

  {
    "schema": "teamhero.interview.bootstrap/v1",
    "role":    { "slug", "title", "stack", "domain" },
    "project": { "mode", "stackByCandidate", "outputDir",
                 "timeBoxMinutes", "feature", "analysisMode",
                 "rubricMode", "jd": { "path", "influencesProject" } },
    "ai":      { "model" },
    "github":  null | { "url" }
  }

Field discipline:
- outputDir is always absolute so the agent doesn't have to know
  what cwd the command ran from.
- jd is null when no JD was attached; payload-bearing fields when
  one is.
- github is null unless --publish ran AND succeeded; an agent can
  json-null-check this to decide whether to email a link.
- ai.model echoes AI_MODEL env or the gpt-5-mini default for cost
  attribution.

Refactors along the way:
- Threaded os.Stdout and os.Stderr separately through main.go ->
  runInterview -> runInterviewBootstrap{,WithWizard}. They were
  collapsed to a single writer at the CLI entry point, which is
  why the existing "stdout"/"stderr" naming in dispatcher code was
  cosmetic until now — both ended up at os.Stderr.
- Split publishToGitHub() out of offerPublishToGitHub() so the new
  autoPublishToGitHub var (the --publish entry point) reuses the
  exact API + git plumbing without a confirm prompt.

TDD: written red → green for each of:
- --json flag parsing
- --publish flag parsing
- payload contents (role/project/ai/jd fields)
- stdout vs stderr routing (human output to stderr when --json)
- --publish + --json combination puts URL in payload
- --publish without --json doesn't emit a JSON envelope
- No --json preserves the existing human stdout behavior

Smoke verified end-to-end: `teamhero interview bootstrap --json
--jd-influences-project ...` produces the schema-v1 JSON on stdout,
pipes cleanly to jq, and leaves human-readable logs visible on
stderr for a watching operator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant