feat(interview): Candidate AI Code Interview tool (all 5 slices)#10
feat(interview): Candidate AI Code Interview tool (all 5 slices)#10asabaylus wants to merge 43 commits into
Conversation
Captures the design for a candidate screening kit + `teamhero interview` namespace with absolute scoring and mandatory human-in-the-loop review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Establishes the rubric foundation for the candidate AI-collaboration interview grader (slice 1). Ships a deep module exposing 9 dimensions across two groups and three evidence modes: - 6 process dimensions: upfront-design, context-engineering, critical-evaluation, verification, course-correction, risk-awareness - 3 outcome dimensions: architectural-quality, test-pass, throughput - 4 deterministic, 2 hybrid, 3 llm-judge evidence modes Public API: RUBRIC_VERSION, getRubricVersion(), getDimensions(), getDimension(id), getEvidenceMode(id). Tests cover version stamping, dimension lookup, evidence-mode classification, group split, and required-field population. Refs: teamhero-scripts-3sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the `interview` verb to the TS CLI, mirroring the report/setup/doctor pattern: forwards all args (including --help and verb names) to the Go TUI binary. Verb-level routing (bootstrap, grade, cohort) lives in the Go TUI. Refs: teamhero-scripts-3sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the Go TUI side of the interview command. Routes `interview <verb>` to per-verb stubs (bootstrap, grade, cohort) that each print "not yet implemented" and exit non-zero. Top-level usage and per-command help routing extended to cover the new subcommand. Refs: teamhero-scripts-3sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Establishes the wire format between the future TS interview service and the Go TUI. Defines InterviewEvent as a discriminated union (starting with `progress`), plus parse/serialize functions that round-trip cleanly, emit one event per line, and reject malformed or unknown event types. Future slices add `observation`, `measurement`, `audit`, `result`, and `error` events to the union — additive by design, discriminated on type. Refs: teamhero-scripts-3sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Publishes the two foundational docs for the interview grader: - `interview-rubric.md` — formal rubric in observation framework. Lists all 9 dimensions across process/outcome groups, evidence modes (4 deterministic, 2 hybrid, 3 llm-judge), and per-dim observation vs. measurement output shape. No scoring levels anywhere. - `interview-classification-rationale.md` — methodology and ethics doc with binding top-section preamble covering: (1) observations- not-scores, (2) bias diversification (NOT elimination — never claim the AI is "non-biased"), (3) mandatory human-in-the-loop with manager sign-off, (4) GDPR Article 15 caveat for MVP candidate-no-audit-access default. Per-dimension methodology, interviewer-bias guard, and schema-level scoring-drift guard follow the preamble. Refs: teamhero-scripts-3sh Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…le config, kit scaffold
Implements the hiring manager's bootstrap user journey end-to-end (headless mode).
Deep modules:
- `project-validator` (Mode A: CLAUDE.md / GLOSSARY.md / 2+ deep modules /
failing test / LOC 400-700; Mode B: BRIEF.md with time-box, acceptance,
deliverables sections)
- `role-config` (typed schema, validator with rubric-mode-conditional
requirements: custom -> non-empty customPrompt, default+jd -> existing
jdPath; round-tripping persistence)
- `project-generator` (injectable GeneratorClient interface; retries up
to 3x on validation failure; embeds the interview-kit template files
into bootstrap output)
- `orchestrator.runBootstrap` glues the three: validates config, runs
generation, writes role-config.json
Production wiring:
- `openai-generator-client.ts` calls the Responses API with strict
json_schema requiring `files[].{path,content}`; prompt encodes role
metadata, rubric, mode-A vs mode-B contract, and retry diagnostics
- `scripts/run-interview-bootstrap.ts` CLI entry the Go TUI spawns
- `tui/interview_bootstrap.go` parses headless flags (--role / --stack /
--domain / --feature / --time-box / --mode-{project,analysis,rubric} /
--output-dir / --jd-path / --custom-prompt / --kit-dir), validates them
with the same rules as TS, and shells out to bun running the script
Embedded interview-kit placeholder templates (Slice 3 enriches):
- start.sh / end.sh (placeholder; Slice 3 implements privacy gate + asciinema)
- INTERVIEW_RULES.md, RUBRIC_OVERVIEW.md, PRIVACY_RELEASE.md (placeholders)
- .claude/settings.json (empty hooks; Slice 3 wires)
- .claude/CLAUDE.md (minimal candidate-agent framing)
Test coverage:
- Project validator: 11 cases covering Mode A positive + each negative
failure (missing files, sprawl-only, no failing tests, LOC bounds) and
Mode B positive + missing/empty/incomplete brief
- Role config: 12 cases covering field requirements, rubric-mode
conditionals, time-box bounds, persistence round-trip, refusal to
write invalid configs
- Project generator: 5 cases covering Mode A success, retry-then-success,
exhaustion after 3 failures, kit template copy, Mode B success
- Orchestrator: 3 cases covering happy path, validation rejection without
generator invocation, kit-dir merging
- Go: 10 cases covering flag parsing, validation rejection of incomplete
/ malformed flags, runner delegation, exit-code forwarding
Notes: --headless is the only Slice 2 path. Interactive `huh` wizard
deferred. `assess_config.go` referenced in acceptance criteria does not
exist on this branch (assess maturity work has not merged); follows the
existing `doctor.go` style instead.
Refs: teamhero-scripts-7qe
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ripts + templates)
Completes the candidate's user journey: clone, sign, code, push. The
kit lives at `teamhero-interview-kit/` in this repo and is copied
verbatim into every bootstrap output by Slice 2.
Privacy gate (POSIX-compatible deep module):
- `lib/privacy-gate.sh` exposes `check_privacy_release_signed PATH`
(function) and runs the same check when invoked as a script. Detects
missing file, empty file, missing/whitespace/placeholder values under
the "## Signed" and "## Date" sections.
Shell entry points:
- `start.sh` — calls the privacy gate, refuses on unsigned with clear
instructions, verifies UserPromptSubmit + PreToolUse hooks present in
.claude/settings.json, writes .interview-state/started-at, and execs
asciinema rec terminal.cast. Honors SKIP_RECORD=1 + ASCIINEMA_BIN env
for testability.
- `end.sh` — verifies start was run, stages PRIVACY_RELEASE.md +
terminal.cast + interview.log + INTERVIEW_RULES.md, and creates a
single git commit. Honors SKIP_COMMIT=1 for testability.
Templates filled in with real content:
- `PRIVACY_RELEASE.md` — full placeholder release with REVIEW WITH LEGAL
warning, no-training clause, deletion/access/appeal rights (30-day
appeal window after hiring decision), bias acknowledgement,
what-the-AI-sees and what-it-doesn't-see sections.
- `INTERVIEW_RULES.md` — candidate-facing rules with `{{TIME_BOX}}`
placeholder, before/during/finishing sections, and WSL setup
instructions for Windows candidates (apt install + clone under ~/
not /mnt/c/).
- `RUBRIC_OVERVIEW.md` — plain-language summary of all 9 dimensions
(process × 6, outcome × 3) including the "why no scoring" framing.
- `.claude/settings.json` — UserPromptSubmit + PreToolUse hooks each
pipe a structured JSONL line to `interview.log` via jq.
- `.claude/CLAUDE.md` — candidate-agent framing covering role, what to
do with knowledge of being observed (be normal, do not coach toward
rubric), and what not to do (no silent destructive ops, no external
exfil).
Tests (17 new):
- Privacy gate (9 cases): signed, missing file, empty file, placeholder-
only, signature-no-date, date-no-signature, no path argument, missing
sections, whitespace-padded real values.
- Kit smoke (8 cases): start refuses unsigned; start passes when signed
(SKIP_RECORD); settings.json declares both hooks emitting to
interview.log; end refuses without prior start; full start→end
round-trip (SKIP modes); INTERVIEW_RULES mentions WSL; RUBRIC_OVERVIEW
mentions all 9 dims; PRIVACY_RELEASE has REVIEW WITH LEGAL +
no-training + appeal + 30-day language.
Refs: teamhero-scripts-h13
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er, audit writer)
Largest slice. Produces a complete per-candidate audit end-to-end:
inputs → events → measurements + observations → summary.md / audit.md /
audit.json / evidence/. Headless mode is wired through the Go TUI.
Evidence collectors (4 input adapters):
- `collectors/asciinema.ts` — parses v2 cast files; reconstructs typed
commands from input events with pause-before-Enter timing; handles
backspace.
- `collectors/jsonl-log.ts` — parses interview.log (kit-emitted JSONL);
separates prompts from tool-use; silently drops malformed lines.
- `collectors/transcript.ts` — supports [HH:MM:SS] Speaker: text format
(Granola/Otter/Fireflies markdown export shape) and bare Speaker:
lines; skips VTT timing lines and comment lines.
- `collectors/git-history.ts` — parses `git log` via an injectable
runner; tolerates non-git directories by returning empty list.
Deterministic extractors (4 pure functions):
- `extractors/verification.ts` — counts test/typecheck runs across
bun/npm/pnpm/yarn/go/pytest/jest/vitest/cargo/just, plus
diff/grep/cat reads, plus test-runs-immediately-after-prompt.
- `extractors/risk-awareness.ts` — pattern-matches 13 destructive
command families; reports each with pause-before-Enter timing.
- `extractors/test-pass.ts` — injectable TestRunner; default runner
picks bun vs go from project layout.
- `extractors/throughput.ts` — session start/end, elapsed, commits,
time-to-first-test, avg gap between commits.
AI observer (deep module with explicit ethical guards):
- `INTERVIEWER_BIAS_GUARD` constant — verbatim instruction told to
every observation prompt to treat interviewer commentary as
situational context, not evidence about the candidate.
- `OBSERVATION_RESPONSE_SCHEMA` — strict json_schema with
additionalProperties: false on every level; declares ONLY
dimension_id / observation / reasoning / evidence_excerpts /
caveats. No score, weighted_total, raw_total, band, signal_count.
- `rejectIfScored` — second-layer validator; throws if the parsed
response contains any forbidden field (defense in depth: schema
+ post-parse check + writer renders nothing numerical).
- `buildObserverPrompt` — pure function; supports default / custom /
default+jd rubric modes; reads JD from disk; embeds rubric for the
5 observable dims only (deterministic dims are extracted, not
observed); never includes the session_recording_url.
- `humanOnlyObservations` — returns blank fillable templates for
human-only analysis mode.
- `OpenAIObserverClient` — production client; Responses API; gpt-5-mini
default; the rejectIfScored guard runs on every response.
Audit writer (pure transformation):
- `renderSummary` / `renderAudit` — markdown rendering; same warning
banner at top of both ("THIS AUDIT IS ADVISORY ... not a score");
YAML frontmatter with Obsidian-friendly tags + candidate + role +
date + rubric_version + rubric_mode + signed_off +
session_recording_url + session_platform + session_date;
reasoning chain preserved in BOTH tiers (transparency).
- Per-dimension layout: measurements first (for deterministic / hybrid
dims), then observation + reasoning + evidence (for LLM-judge /
hybrid dims).
- `validateSignOff` — categorical recommendation (Hire / Hire with
notes / No hire) plus manager-written reasoning >=20 chars; the TUI
uses this to reject empty submissions.
Grade orchestrator:
- `gradeCandidate` — clones repo (injectable cloner) or uses
localRepoPath; reads role-config.json from the repo; runs all 4
collectors + 4 extractors; calls AI observer (or human-only path);
writes audit; copies PRIVACY_RELEASE.md, terminal.cast,
interview.log, transcript, interviewer-notes into evidence/.
Cleans up the temp clone on exit.
- `scripts/run-interview-grade.ts` — CLI entry the Go TUI spawns.
Go TUI:
- `tui/interview_grade.go` — headless flag parser (positional repo or
--repo, plus --candidate, --transcript, --interviewer-notes,
--session-recording-url, --session-platform, --session-date,
--output-dir, --local-repo-path) with validation; mandatory
ADVISORY warning banner printed before spawning the TS subprocess.
- `runInterview` dispatcher wires grade to runInterviewGrade.
Tests (50 new):
- Collectors: 10 (asciinema header+commands+backspace, interview.log
split + malformed-line tolerance, transcript timestamped + bare +
VTT, git history with stub runner, git failure tolerance).
- Extractors: 9 (verification counts + interleaving; risk-awareness
zero-detection / rm -rf with pause / force-push and reset;
test-pass with injected runner including n/a case; throughput
elapsed + commits + time-to-first-test + empty).
- AI observer: 12 (bias guard verbatim; role metadata in input;
rubric for 5 observable dims only; custom prompt mode; JD content
mode; session URL NOT in prompt; interviewer notes routed to user
input not instructions; schema has no score/total/band fields;
rejectIfScored on score / weighted_total / clean; human-only
template count and content).
- Audit writer: 15 (summary frontmatter, banner, reasoning preserved,
measurements rendered, sign-off section with Hire|notes|No hire,
session_recording_url in frontmatter; audit banner; audit reasoning
preserved; audit evidence not truncated; writeAudit produces all 4
outputs; validateSignOff rejects non-categorical; rejects empty
reasoning; rejects too-short reasoning; accepts substantive
Hire/notes/No-hire).
- Grade orchestrator: 4 (full happy path produces all outputs +
PRIVACY_RELEASE copied; missing role-config errors; human-only
skips observer; session_recording_url in frontmatter NOT in prose).
- Go: 8 (grade flag parsing positional / all-flags; validation rejects
missing candidate / missing repo / bad platform; warning banner
printed; exit code forwarded).
`just test`: 1632 pass. `go test`: all green.
Refs: teamhero-scripts-bt2
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes the candidate-interview-grader MVP. Hiring manager can now
roll up the cohort and orchestrate the whole flow conversationally.
Cohort summary module (deep, pure):
- `cohort-summary.ts`:
- `loadCohort(roleDir)` reads every `<candidate>/audit.json`, returns
typed records; skips malformed JSON silently.
- `renderCohortSummary` produces COHORT.md as a markdown table with
columns: Candidate | Interviewed | Sign-off | Recommendation |
Audit. NO score, total, or rank column. Default order is
alphabetical; chronological is opt-in.
- Mandatory ADVISORY warning banner at the top.
- Pending sign-offs marked ⏳; reviewed marked ✅.
- Header line reports pending vs reviewed counts.
- Friendly "no candidates yet" copy for empty roles.
- `writeCohortSummary` writes the result to `<roleDir>/COHORT.md`.
Audit frontmatter extension:
- `AuditFrontmatter.recommendation` (optional, categorical) added so the
cohort renderer can show the manager's sign-off choice without
re-parsing the markdown.
Production wiring:
- `scripts/run-interview-cohort.ts` — CLI entry the Go TUI spawns.
- `tui/interview_cohort.go` — headless flag parser (--role / --role-dir /
--order) + validation + bun subprocess runner.
- `runInterview` dispatcher routes cohort → runInterviewCohort.
Claude skill — bounded-context wrapper:
- `skills/teamhero-interview/SKILL.md` ships in-repo. Users install by
copying to `~/.claude/skills/teamhero-interview/`.
- Frontmatter `name: teamhero-interview` + descriptive trigger
description covering bootstrap / grade / cohort / hiring queries.
- Ethical framing reproduced as binding rules: observations not scores,
bias diversification (not elimination), mandatory human-in-the-loop.
- All 3 MVP verbs documented with example invocations.
- v1.5 verb stubs (list-roles, list-candidates) acknowledged with the
current workaround (ls docs/interviews/).
- Cohort orchestration workflow described: read role config → ask user
for transcript per candidate → grade each → run cohort verb → remind
manager that each audit needs its own sign-off.
- Explicit "what NOT to do" section: do not produce scores, do not
bypass the privacy gate, do not feed session_recording_url to the
observer, do not re-introduce score fields in code.
- Skill repeats explicitly: contains no business logic; all logic lives
in src/services/interview/.
Tests (24 new):
- Cohort summary: 11 (loadCohort missing-dir; reads each candidate;
skips folders without audit.json; banner present; required columns +
no Score/Total/Rank; pending vs reviewed indicators; alphabetical
default; chronological opt-in; empty-cohort message; counts in
header; writeCohortSummary produces file).
- Skill: 9 (skill exists; frontmatter has name + description; all 3
MVP verbs documented; v1.5 stubs mentioned; ethical framing —
observations-not-scores + bias-diversification + human-in-the-loop;
refusal-of-numerical-scores instruction; cohort orchestration
documented; no-business-logic statement present; session_recording_url
feeding warning present).
- Go: 4 (cohort flag parsing; validation rejects missing --role;
validation rejects bad --order; runner delegation + exit code
forwarding).
`just build-all` ✓, `just test-all` ✓ (1652 pass, 0 fail; Go all green).
Refs: teamhero-scripts-9v2
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two CI failures on PR #10's test-typescript job: 1. scripts/run-tests.sh line 64 had a bash arithmetic syntax error when bun emitted multi-line "N pass" / "N fail" / "N skip" output. The grep captured all matches (e.g., "3\n0"), and \$((PASS + file_pass)) errored with "syntax error in expression". Counts silently desynced (CI 1398 pass vs local 1652). Fixed by piping to tail -1 so only the summary line's match is used. 2. Function coverage was 84.49% (under 85% threshold) in CI. Added targeted tests for the uncovered seams in the new interview module: - OpenAIObserverClient (constructor, observe() happy path, missing output_text guard, rejectIfScored defense-in-depth) - summarizeEvents non-prompt branches (tool-use, command, commit, transcript-line) via buildObserverPrompt - validateGenerated re-runs validation on an existing output dir - extractTestPass falls through to realRunner on dirs without go.mod / package.json Local coverage: 88.39% → 88.79% functions (over the 85% threshold with ~4pp margin). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements the three v1.5 slices for `teamhero interview`: - **Slice 1 (teamhero-scripts-92a)**: Interactive huh.Form wizard for `teamhero interview bootstrap`. Invoking with no flags on a TTY drops into a wizard that walks role identity, time-box, project mode, analysis mode, and rubric mode — with conditional follow-ups for the custom-prompt and default+jd branches. Form-state-building is isolated in pure helpers (newBootstrapWizardModel, bootstrapWizardNextState, bootstrapWizardOptionsFromModel) so the rubric-mode branching, defaults, and option round-trip are all unit-testable without rendering a TTY. The wizard hands its BootstrapOptions to the same ValidateBootstrapOptions gate the headless path uses, so there is no parallel validation logic. Headless mode (any flags supplied) is unchanged. - **Slice 2 (teamhero-scripts-upo)**: Three TUI polish concerns: (1) shared interviewStyles module centralizing the palette so future screens stay visually aligned with the report wizard; (2) glamour- rendered audit preview (`renderInterviewAuditPreview`) that pins the ADVISORY warning banner above the source content — managers read the preview before the sign-off file; (3) phased progress display (`interviewProgressModel`) consuming the existing InterviewProgressEvent JSON-lines protocol from the bun subprocess, with five canonical phases (clone → collect-evidence → extract-measurements → observe → audit-write) and graceful failure surfacing. - **Slice 3 (teamhero-scripts-rjp)**: Manual end-to-end smoke script at `scripts/manual-test-interview.sh` walking a human operator through the wizard, headless bootstrap, grade with --mode-analysis human-only (no OpenAI spend), sign-off gating, and cohort rendering. README documents the interactive wizard as the primary path; the Claude skill at `skills/teamhero-interview/SKILL.md` now recommends the wizard on a TTY and falls back to the headless flag list for agents and scripting. Test coverage: ~35 new Go tests covering wizard state-machine branching, field validators (URL-safe slug, time-box range, JD-path existence), options round-trip, dispatcher wizard-vs-headless decision, preview pinning, progress phase transitions, manual-test- script structure. Overall Go coverage: 87.7% (above 85% threshold). TS: 1660 pass, 0 fail. No behavior change to the headless path. Closes teamhero-scripts-92a, teamhero-scripts-upo, teamhero-scripts-rjp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the candidate-interview-grader plan to reflect what has actually landed on PR #10. Replaces the "deferred until go-ahead" Next Steps section with an Implementation Progress table that tracks the 6 MVP slices (beads `teamhero-scripts-{3sh,7qe,h13,bt2,9v2}`) and the 3 v1.5 slices (beads `teamhero-scripts-{92a,upo,rjp}`) — all closed. Adds a note that the `tui/assess_*.go` reference files don't exist on `main` yet, so v1.5 aligned with `tui/wizard.go` + `tui/setup.go` instead. The "Still deferred" list keeps the pilot, `list-roles`/`list-candidates`, multi-model bias diversification, and periodic anonymized bias audit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a complete interview-review feature: CLI/TUI verbs (bootstrap, review, cohort), headless Bun scripts, TypeScript services for bootstrap and review, evidence collectors/extractors, OpenAI observer with strict schema and bias guard, audit/cohort writers with ADVISORY banner and sign-off, kit scripts/docs, and extensive tests. ChangesInterview Reviewer
Sequence Diagram(s)sequenceDiagram
participant CLI as teamhero CLI
participant TUI as Go TUI
participant Bun as Bun scripts
participant Svc as TS Services
participant OpenAI as OpenAI Responses API
participant FS as Filesystem
CLI->>TUI: run `teamhero interview review ...`
TUI->>Bun: invoke `scripts/run-interview-review.ts` (flags)
Bun->>Svc: reviewCandidate(input, deps)
Svc->>FS: clone/read repo, read evidence files
Svc->>Svc: compute measurements (extractors)
Svc->>OpenAI: observe(prompt with strict json_schema)
OpenAI-->>Svc: structured observations
Svc->>FS: write summary.md, audit.md, audit.json, evidence/
Bun-->>TUI: return artifact paths
TUI-->>CLI: render preview + ADVISORY banner
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 4
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (18)
teamhero-interview-kit/lib/privacy-gate.sh-16-16 (1)
16-16:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMove
set -uinside the conditional execution block to avoid mutating the caller's environment.The global
set -uat line 16 is applied whenever this file is sourced as a library, which unexpectedly enables thenounsetoption in any caller that doesn't already have it enabled. This violates the library pattern and can break scripts that rely on unset variable behavior.Move the
set -udeclaration inside the conditional check at lines 72–75 so it only applies during direct script execution:Suggested fix
-set -u - # extract_section <file> <heading> — prints the body that follows a given# When invoked directly (not sourced), run the check on $1 and exit with the # function's return code. if [ "${BASH_SOURCE[0]:-$0}" = "$0" ]; then + set -u check_privacy_release_signed "${1:-}" exit $? fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@teamhero-interview-kit/lib/privacy-gate.sh` at line 16, Remove the top-level "set -u" so the library doesn't change the caller's shell options, and instead enable nounset only inside the script's main execution guard (the conditional that checks whether the file is run directly vs sourced); place "set -u" at the start of that guard block (and leave it out of any exported functions or top-level code) so nounset applies only during direct execution and not when the file is sourced.src/services/interview/bootstrap/project-validator.ts-21-29 (1)
21-29:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrevent recursive walk from following symlink cycles
walkusesstatSync, which follows symlinks. A cyclic symlink inside generated project contents can cause unbounded recursion.Proposed fix
-import { existsSync, readFileSync, readdirSync, statSync } from "node:fs"; +import { existsSync, lstatSync, readFileSync, readdirSync } from "node:fs"; @@ for (const entry of readdirSync(dir)) { if (entry === "node_modules" || entry === ".git") continue; const full = join(dir, entry); - const s = statSync(full); + const s = lstatSync(full); + if (s.isSymbolicLink()) continue; if (s.isDirectory()) walk(full, out); else out.push(full); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/bootstrap/project-validator.ts` around lines 21 - 29, The walk function can follow symlinks because it uses statSync, causing potential cycles; change walk(dir: string, out: string[] = []) to skip symbolic links (use lstatSync and if entry is a symlink, skip or optionally resolve safely) or add a visited set of realpaths to prevent revisiting; specifically update the logic around readdirSync + statSync to use lstatSync and either ignore entries where lstatSync(...).isSymbolicLink() or track realpath (fs.realpathSync) in a visited Set inside walk (or an additional parameter) and return early when a path is already visited to prevent unbounded recursion.src/services/interview/shared/events.ts-30-41 (1)
30-41:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd validation for
stepandstatusfields inparseInterviewEventThe function accepts
{"type":"progress"}without validating required fieldsstepandstatus, allowing malformed events to pass type checking at runtime. Themessageandprogressoptional fields are also unvalidated if present.Proposed fix
export function parseInterviewEvent(line: string): InterviewEvent | null { let value: unknown; try { value = JSON.parse(line); } catch { return null; } if (!value || typeof value !== "object") return null; - const type = (value as { type?: unknown }).type; + const obj = value as Record<string, unknown>; + const type = obj.type; if (typeof type !== "string") return null; if (!KNOWN_EVENT_TYPES.includes(type as InterviewEvent["type"])) return null; - return value as InterviewEvent; + if (type === "progress") { + const step = obj.step; + const status = obj.status; + if (typeof step !== "string") return null; + if ( + status !== "start" && + status !== "update" && + status !== "done" && + status !== "error" + ) { + return null; + } + if (obj.message !== undefined && typeof obj.message !== "string") return null; + if (obj.progress !== undefined && typeof obj.progress !== "number") return null; + return obj as InterviewProgressEvent; + } + return null; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/shared/events.ts` around lines 30 - 41, parseInterviewEvent currently accepts {"type":"progress"} without validating required fields; update parseInterviewEvent to, after confirming type is "progress", check that (value as any).step exists and is a string and (value as any).status exists and is a string and matches the allowed progress statuses (use your existing InterviewEvent status union or a small array of allowed status strings), and also validate optional fields: if message is present it must be a string, and if progress is present it must be a number; reject (return null) for any type-mismatches so that only well-formed InterviewEvent objects (for all types, but especially "progress") are returned while still using KNOWN_EVENT_TYPES and casting to InterviewEvent when validation passes.src/services/interview/assess/extractors/verification.ts-58-71 (1)
58-71:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInterleaving metric ignores the advertised 30-second window.
The code only checks prompt→test adjacency, but never validates the 30s condition described in the comment and fact label context.
Proposed fix
let interleavedAfterPrompt = 0; for (let i = 1; i < merged.length; i++) { const cur = merged[i]; const prev = merged[i - 1]; if (cur.type === "command" && prev.type === "prompt") { - interleavedAfterPrompt += 1; + const deltaMs = + new Date(cur.timestamp).getTime() - new Date(prev.timestamp).getTime(); + if (Number.isFinite(deltaMs) && deltaMs >= 0 && deltaMs <= 30_000) { + interleavedAfterPrompt += 1; + } } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/assess/extractors/verification.ts` around lines 58 - 71, The interleaving count currently only checks adjacency between merged entries but ignores the intended 30-second window; update the loop that computes interleavedAfterPrompt (which iterates over merged built from prompts and testRuns) to parse timestamps to numeric time and only increment interleavedAfterPrompt when cur.type === "command" && prev.type === "prompt" && (new Date(cur.timestamp).getTime() - new Date(prev.timestamp).getTime()) <= 30000; ensure merged remains sorted by timestamp before this check.src/services/interview/bootstrap/role-config.ts-121-126 (1)
121-126:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate and harden
readRoleConfiginstead of unchecked cast.
JSON.parse(... as RoleConfig)allows malformed shape through (or throws unwrapped parse errors), which makes downstream failures harder to diagnose.Proposed fix
export function readRoleConfig(dir: string): RoleConfig | null { const path = join(dir, ROLE_CONFIG_FILENAME); if (!existsSync(path)) return null; - const body = readFileSync(path, "utf8"); - return JSON.parse(body) as RoleConfig; + const body = readFileSync(path, "utf8"); + let parsed: unknown; + try { + parsed = JSON.parse(body); + } catch (error) { + throw new Error(`Invalid ${ROLE_CONFIG_FILENAME}: malformed JSON`); + } + const result = validateRoleConfig(parsed as RoleConfig); + if (!result.ok) { + throw new Error( + `Invalid ${ROLE_CONFIG_FILENAME}: ${result.failures.join("; ")}`, + ); + } + return parsed as RoleConfig; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/bootstrap/role-config.ts` around lines 121 - 126, readRoleConfig currently does an unchecked JSON.parse and casts to RoleConfig; wrap the read+parse in a try/catch to surface parse errors and then validate the parsed object’s shape (check required properties and their types expected by RoleConfig) before returning it — if validation fails return null or throw a descriptive Error with context including ROLE_CONFIG_FILENAME and the path, so callers don’t receive a malformed RoleConfig instance; implement this inside readRoleConfig.src/services/interview/assess/collectors/transcript.ts-24-37 (1)
24-37:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTranscript parser is too brittle for common speaker labels and invalid anchor timestamps.
The bare-speaker regex can skip valid lines (e.g.,
Speaker 1: ...), and an invalidsessionStartIsocan throw ontoISOString().Proposed fix
-const BARE = /^([A-Za-z][A-Za-z .'-]+?):\s+(.+)$/; +const BARE = /^([^:\[\]#][^:]{0,79}?):\s+(.+)$/; function toIsoFromHMS(hms: string, sessionStartIso: string): string { const parts = hms.split(":").map(Number); @@ - const base = new Date(sessionStartIso).getTime(); - return new Date(base + totalSeconds * 1000).toISOString(); + const parsedBase = new Date(sessionStartIso).getTime(); + const base = Number.isFinite(parsedBase) + ? parsedBase + : new Date("1970-01-01T00:00:00.000Z").getTime(); + return new Date(base + totalSeconds * 1000).toISOString(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/assess/collectors/transcript.ts` around lines 24 - 37, The bare-speaker regex (BARE) is too strict (misses labels like "Speaker 1") and toIsoFromHMS can throw if sessionStartIso or parsed HMS are invalid; update BARE to allow digits and optional numeric suffixes (e.g., change /^([A-Za-z][A-Za-z .'-]+?):\s+(.+)$/ to accept names with digits and leading words such as "Speaker 1"), and harden toIsoFromHMS: validate sessionStartIso by parsing new Date(sessionStartIso) and checking for NaN, validate each HMS part with Number.isFinite before computing totalSeconds, and if either the base date or HMS parts are invalid return a safe fallback (e.g., the original sessionStartIso or new Date().toISOString()) instead of calling toISOString on an invalid date; reference TIMESTAMPED and BARE for regex fixes and toIsoFromHMS for the parsing/validation changes.src/services/interview/assess/extractors/test-pass.ts-20-42 (1)
20-42:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd timeout and error handling to test runner processes to prevent grader hangs.
Both
spawnSync()calls lack timeout configuration and don't capturer.error, which can cause the grader to block indefinitely if a test process hangs or fails to spawn.Proposed fix
+const TEST_RUN_TIMEOUT_MS = 120_000; + const realRunner: TestRunner = (repoDir) => { if (existsSync(join(repoDir, "go.mod"))) { const r = spawnSync("go", ["test", "./..."], { cwd: repoDir, encoding: "utf8", + timeout: TEST_RUN_TIMEOUT_MS, + killSignal: "SIGKILL", }); return { passed: countMatches(r.stdout ?? "", /^ok\s+/gm), failed: countMatches(r.stdout ?? "", /^FAIL\s+/gm), - output: `${r.stdout ?? ""}\n${r.stderr ?? ""}`, + output: `${r.stdout ?? ""}\n${r.stderr ?? ""}${ + r.error ? `\n${r.error.message}` : "" + }`, }; } if (existsSync(join(repoDir, "package.json"))) { const r = spawnSync("bun", ["test"], { cwd: repoDir, encoding: "utf8", + timeout: TEST_RUN_TIMEOUT_MS, + killSignal: "SIGKILL", }); - const combined = `${r.stdout ?? ""}\n${r.stderr ?? ""}`; + const combined = `${r.stdout ?? ""}\n${r.stderr ?? ""}${ + r.error ? `\n${r.error.message}` : "" + }`;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/assess/extractors/test-pass.ts` around lines 20 - 42, The spawnSync calls in src/services/interview/assess/extractors/test-pass.ts (the blocks that run "go test ./..." and "bun test") need timeout and error handling to avoid hangs: add a timeout option (e.g. 30000 ms) and a larger maxBuffer to both spawnSync invocations, check r.error and r.status (or r.signal) and include r.error/message/signal in the returned output string, and treat a timeout/spawn failure as test failures (set failed>0 or a specific flag) by returning a non-zero failed count when r.error or r.signal is present; update the returned objects for the functions that use the "r" result so both passed/failed and output reflect these error/timeout conditions.src/services/interview/assess/collectors/git-history.ts-14-18 (1)
14-18:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReplace
execSyncwithexecFileSyncto avoid shell string construction.Building a shell command string for
execSyncis unnecessary and fragile. Pass arguments directly to the git executable viaexecFileSync, which avoids shell interpretation entirely and is the standard Node.js best practice for spawning processes.Proposed fix
-import { execSync } from "node:child_process"; +import { execFileSync } from "node:child_process"; @@ const realRunner: GitRunner = (args, cwd) => - execSync(`git ${args.map((a) => `'${a.replace(/'/g, "'\\''")}'`).join(" ")}`, { + execFileSync("git", args, { cwd, encoding: "utf8", });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/assess/collectors/git-history.ts` around lines 14 - 18, The current realRunner builds a shell command string and calls execSync, which is fragile; change realRunner to call execFileSync directly with the git executable and the args array (e.g. execFileSync('git', args, { cwd, encoding: 'utf8' })), remove the manual quoting/escaping logic, and ensure execFileSync is imported from child_process; keep the GitRunner signature (args, cwd) and preserve options like cwd and encoding.scripts/run-interview-grade.ts-7-7 (1)
7-7:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove
override: trueto prevent CI/shell-provided environment variables from being overwritten.The
override: trueoption causes dotenv to replace any existing process.env values with those from the .env file. This will clobber runtime-provided secrets and configuration (e.g., from CI systems), breaking expected behavior in non-local environments. Use the default behavior instead.Proposed fix
-loadDotenv({ override: true }); +loadDotenv();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/run-interview-grade.ts` at line 7, Call to loadDotenv({ override: true }) in scripts/run-interview-grade.ts is forcing .env values to overwrite existing environment variables; remove the override option so dotenv uses default behavior and does not clobber CI/shell-provided vars. Locate the loadDotenv invocation in that file (the loadDotenv symbol) and change it to loadDotenv() or remove the override property so existing process.env entries remain intact.src/services/interview/assess/collectors/jsonl-log.ts-28-30 (1)
28-30:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandle missing
interview.logas empty evidence instead of throwing.A missing log file currently throws before parsing starts, which can fail an entire grade run for absent optional evidence.
💡 Proposed fix
export function parseInterviewLog(path: string): InterviewLogParseResult { - const body = readFileSync(path, "utf8"); + let body = ""; + try { + body = readFileSync(path, "utf8"); + } catch (err) { + if ((err as NodeJS.ErrnoException).code === "ENOENT") { + return { prompts: [], toolUses: [] }; + } + throw err; + } const lines = body.split("\n").filter((l) => l.trim().length > 0);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/assess/collectors/jsonl-log.ts` around lines 28 - 30, The parseInterviewLog function currently calls readFileSync and throws if the file is missing; update parseInterviewLog to treat a missing interview.log as empty evidence instead of failing: check for file existence (e.g., using fs.existsSync) or wrap readFileSync in a try/catch, and if the file is absent or unreadable return an empty InterviewLogParseResult (empty lines/entries) rather than letting the exception propagate; keep the rest of parsing logic intact so parseInterviewLog and its callers receive an empty result when interview.log is not present.scripts/run-interview-bootstrap.ts-81-93 (1)
81-93:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate
timeBoxand mode flag values before constructingRoleConfig.The current parsing allows invalid values (e.g.,
--time-box abc, unknown--mode-*) to pass as typed config and fail later in less clear ways.💡 Proposed fix
function buildConfig(flags: ParsedFlags): RoleConfig | string { @@ - const timeBox = Number.parseInt(flags.timeBox ?? "90", 10); + const timeBox = Number.parseInt(flags.timeBox ?? "90", 10); + if (!Number.isFinite(timeBox) || timeBox < 30 || timeBox > 240) { + return "Invalid --time-box; expected an integer between 30 and 240"; + } + + const projectModes = new Set<ProjectMode>(["A", "B"]); + const analysisModes = new Set<AnalysisMode>(["ai-assisted", "human-only"]); + const rubricModes = new Set<RubricMode>(["default", "custom", "default+jd"]); + + if (!projectModes.has(flags.modeProject as ProjectMode)) { + return "Invalid --mode-project; expected A or B"; + } + if (!analysisModes.has(flags.modeAnalysis as AnalysisMode)) { + return "Invalid --mode-analysis; expected ai-assisted or human-only"; + } + if (!rubricModes.has(flags.modeRubric as RubricMode)) { + return "Invalid --mode-rubric; expected default, custom, or default+jd"; + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/run-interview-bootstrap.ts` around lines 81 - 93, Before constructing RoleConfig, validate flags.timeBox and each mode flag: ensure timeBox (flags.timeBox) parses to a positive integer and reject non-numeric or <=0 values (set default only after validation), and verify flags.modeProject, flags.modeAnalysis, and flags.modeRubric are valid members of the corresponding enums/types (ProjectMode, AnalysisMode, RubricMode); if any validation fails, log a clear error and exit/throw rather than letting invalid values flow into the RoleConfig. Perform these checks just above where timeBox is parsed and the config object is created (referencing the timeBox variable and the RoleConfig construction), and normalize/cast only after validation so RoleConfig gets only safe, well-typed values.teamhero-interview-kit/start.sh-57-59 (1)
57-59:⚠️ Potential issue | 🟠 Major | ⚡ Quick winWrite
.interview-state/started-atonly after recorder preflight passes.Right now, a missing
asciinemastill leaves a start marker behind. That creates a false “session started” state even though recording never began.Suggested reorder
-# Indicate we passed the gate so end.sh can verify the kit was started. -mkdir -p "$REPO_ROOT/.interview-state" -date -u +%FT%TZ > "$REPO_ROOT/.interview-state/started-at" - if [ "${SKIP_RECORD:-0}" = "1" ]; then + mkdir -p "$REPO_ROOT/.interview-state" + date -u +%FT%TZ > "$REPO_ROOT/.interview-state/started-at" echo "✓ Privacy gate passed. SKIP_RECORD=1 — not starting asciinema." exit 0 fi @@ if ! command -v "$ASCIINEMA_BIN" >/dev/null 2>&1; then @@ exit 1 fi +# Indicate we passed all prerequisites and are starting recording. +mkdir -p "$REPO_ROOT/.interview-state" +date -u +%FT%TZ > "$REPO_ROOT/.interview-state/started-at" + CAST_PATH="$REPO_ROOT/terminal.cast"Also applies to: 66-79
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@teamhero-interview-kit/start.sh` around lines 57 - 59, The start marker file ".interview-state/started-at" is written before the recorder preflight completes, causing false "session started" states if asciinema or recorder setup fails; move the mkdir -p "$REPO_ROOT/.interview-state" and the date -u ... > "$REPO_ROOT/.interview-state/started-at" lines out of the current top-level start flow and into the success branch that runs after the recorder preflight/asciinema checks complete (i.e., only execute these lines when the script confirms recorder is present and recording has been started), and apply the same reorder for the duplicate block around lines 66-79 so the marker is only created on successful recorder initialization.src/services/interview/assess/collectors/asciinema.ts-68-85 (1)
68-85:⚠️ Potential issue | 🟠 Major | ⚡ Quick winParse input chunks character-by-character to avoid missed command submits.
evt.datamay contain multi-character chunks (e.g., pasted text orcmd\rin one event). The current branch logic treats the whole chunk as one token, which can fail to emit commands correctly.Suggested fix
- const data = evt.data; - if (data === "\r" || data === "\n") { - if (buffer.trim().length > 0) { - commands.push({ - type: "command", - timestamp: isoFromUnix(baseEpoch + evt.delta), - source: "terminal.cast", - command: buffer, - pauseSecondsBeforeEnter: Math.max(0, evt.delta - lastKeyDelta), - }); - } - buffer = ""; - } else if (data === "�" || data === "\b") { - // Backspace - buffer = buffer.slice(0, -1); - lastKeyDelta = evt.delta; - } else if (data.charCodeAt(0) >= 32 || data === "\t") { - buffer += data; - lastKeyDelta = evt.delta; - } else { - // Control codes: arrows, escape sequences, etc. Skip silently. - } + for (const ch of evt.data) { + if (ch === "\r" || ch === "\n") { + if (buffer.trim().length > 0) { + commands.push({ + type: "command", + timestamp: isoFromUnix(baseEpoch + evt.delta), + source: "terminal.cast", + command: buffer, + pauseSecondsBeforeEnter: Math.max(0, evt.delta - lastKeyDelta), + }); + } + buffer = ""; + continue; + } + if (ch === "\u007f" || ch === "\b") { + buffer = buffer.slice(0, -1); + lastKeyDelta = evt.delta; + continue; + } + if (ch === "\t" || ch.charCodeAt(0) >= 32) { + buffer += ch; + lastKeyDelta = evt.delta; + } + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/assess/collectors/asciinema.ts` around lines 68 - 85, The event handler currently treats evt.data as a single token, which misses command boundaries when evt.data contains multiple characters (pastes or "cmd\r"); update the processing block that uses data, buffer, commands, isoFromUnix, lastKeyDelta and evt.delta to iterate over each character in evt.data (e.g., for (const ch of data) { ... }) and run the existing branches per-character (newline -> push command with timestamp from isoFromUnix(baseEpoch + evt.delta), backspace -> slice buffer, printable/tab -> append to buffer and set lastKeyDelta = evt.delta), ensuring buffer resets after a submitted command so multi-char chunks are parsed correctly.src/services/interview/bootstrap/orchestrator.ts-36-49 (1)
36-49:⚠️ Potential issue | 🟠 Major | ⚡ Quick winWrap generation/persist steps to preserve the
RunBootstrapResultcontract.
generateProject(...)andwriteRoleConfig(...)can throw; right now that escapes as an unhandled rejection instead of returning a structured failure. This makes bootstrap failure handling brittle at the CLI boundary.Proposed fix
- const generation = await generateProject(config, options.client, { - kitTemplateDir: options.kitTemplateDir, - maxAttempts: options.maxAttempts, - }); + let generation; + try { + generation = await generateProject(config, options.client, { + kitTemplateDir: options.kitTemplateDir, + maxAttempts: options.maxAttempts, + }); + } catch (error) { + return { + ok: false, + attempts: 0, + failures: [ + error instanceof Error ? error.message : String(error), + ], + }; + } if (!generation.ok) { return { ok: false, attempts: generation.attempts, failures: generation.failures, }; } - - writeRoleConfig(config.outputDir, config); + try { + writeRoleConfig(config.outputDir, config); + } catch (error) { + return { + ok: false, + attempts: generation.attempts, + failures: [ + error instanceof Error ? error.message : String(error), + ], + }; + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/bootstrap/orchestrator.ts` around lines 36 - 49, The code calls generateProject(...) and writeRoleConfig(...) which can throw, so wrap the generation and persistence steps in a try/catch around the calls to generateProject and writeRoleConfig inside the function that returns RunBootstrapResult; on any thrown error return a RunBootstrapResult-shaped failure (ok: false) including generation.attempts/failures if available or default values and include the caught error message/context in the failures array or an error field; ensure the function still returns the existing successful shape when generation.ok is true and only writes role config inside the try block to avoid unhandled rejections (refer to generateProject, writeRoleConfig, and the RunBootstrapResult contract).tui/interview_bootstrap_wizard.go-249-251 (1)
249-251:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGate wizard launch on interactive TTY before entering interactive flow.
The dispatcher comment says wizard mode is for TTY callers, but the current condition only checks
len(args) == 0. In non-interactive contexts this can trigger unexpected wizard execution instead of deterministic headless behavior.Proposed fix
// No flags at all → interactive wizard. if len(args) == 0 { + if fi, err := os.Stdin.Stat(); err != nil || (fi.Mode()&os.ModeCharDevice) == 0 { + fmt.Fprintln(stderr, "no interactive TTY detected; pass headless flags") + return 1 + } res, err := launcher.Launch()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tui/interview_bootstrap_wizard.go` around lines 249 - 251, The wizard launch is currently gated only by len(args) == 0 which can run the interactive launcher.Launch() in non-interactive contexts; change the condition to require an interactive TTY as well (e.g., check that stdin is a terminal via os.Stdin/terminal/term IsTerminal equivalents) so that the code only calls launcher.Launch() when len(args) == 0 AND stdin is a TTY; otherwise proceed with the deterministic headless path. Ensure you update the condition around launcher.Launch() and keep the existing dispatcher comment consistent.src/services/interview/cohort/cohort-summary.ts-39-43 (1)
39-43:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate
audit.jsonfrontmatter shape before adding recordsOn Line 39,
JSON.parseis cast directly to{ frontmatter: AuditFrontmatter }. A parseable but malformed file (e.g., missingcandidateorsigned_off) will be pushed and can fail later inrenderRow(Line 53+).Suggested fix
- const parsed = JSON.parse(body) as { frontmatter: AuditFrontmatter }; - records.push({ - frontmatter: parsed.frontmatter, - summaryPath: join(entry, "summary.md"), - }); + const parsed = JSON.parse(body) as { + frontmatter?: Partial<AuditFrontmatter>; + }; + const fm = parsed.frontmatter; + if ( + !fm || + typeof fm.candidate !== "string" || + typeof fm.date !== "string" || + typeof fm.signed_off !== "boolean" + ) { + continue; + } + records.push({ + frontmatter: fm as AuditFrontmatter, + summaryPath: join(entry, "summary.md"), + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/cohort/cohort-summary.ts` around lines 39 - 43, The code currently casts JSON.parse(body) to { frontmatter: AuditFrontmatter } and unconditionally pushes it into records; instead validate the parsed object has a frontmatter and required AuditFrontmatter fields (e.g., candidate, signed_off, etc.) before pushing to records to avoid later failures in renderRow; locate the parsing block around parsed = JSON.parse(body) and replace the blind cast with runtime checks that parsed?.frontmatter exists and that the required keys are present (or skip/log/throw for malformed audit.json), then push only when valid and keep summaryPath as before.src/services/interview/assess/audit-writer.ts-53-73 (1)
53-73:⚠️ Potential issue | 🟠 Major | ⚡ Quick winQuote/escape YAML scalar frontmatter values
On Lines 55–73, frontmatter fields are emitted as raw strings. Values containing
:,#, newlines, or quotes can corrupt parsing or inject unintended keys.Suggested fix
function yaml(value: AuditFrontmatter): string { + const q = (s: string): string => JSON.stringify(s); const lines: string[] = ["---"]; - lines.push(`tags: [${value.tags.join(", ")}]`); - lines.push(`candidate: ${value.candidate}`); - lines.push(`role: ${value.role}`); - lines.push(`date: ${value.date}`); - lines.push(`rubric_version: ${value.rubric_version}`); - lines.push(`rubric_mode: ${value.rubric_mode}`); + lines.push(`tags: [${value.tags.map(q).join(", ")}]`); + lines.push(`candidate: ${q(value.candidate)}`); + lines.push(`role: ${q(value.role)}`); + lines.push(`date: ${q(value.date)}`); + lines.push(`rubric_version: ${q(value.rubric_version)}`); + lines.push(`rubric_mode: ${q(value.rubric_mode)}`); lines.push(`signed_off: ${value.signed_off}`); if (value.recommendation !== undefined) { - lines.push(`recommendation: ${value.recommendation}`); + lines.push(`recommendation: ${q(value.recommendation)}`); } if (value.session_recording_url !== undefined) { - lines.push(`session_recording_url: ${value.session_recording_url}`); + lines.push(`session_recording_url: ${q(value.session_recording_url)}`); } if (value.session_platform !== undefined) { - lines.push(`session_platform: ${value.session_platform}`); + lines.push(`session_platform: ${q(value.session_platform)}`); } if (value.session_date !== undefined) { - lines.push(`session_date: ${value.session_date}`); + lines.push(`session_date: ${q(value.session_date)}`); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/assess/audit-writer.ts` around lines 53 - 73, The yaml(value: AuditFrontmatter) function emits frontmatter scalars unescaped which can break YAML when values contain characters like :, #, quotes, or newlines; update it to sanitize/quote every scalar field (tags can remain array syntax but each tag must be quoted) by adding a small helper (e.g. escapeYamlScalar) and use it for candidate, role, date, rubric_version, rubric_mode, signed_off, recommendation, session_recording_url, session_platform, and session_date; the helper should wrap strings in double quotes and escape internal double quotes and newlines (or alternatively call a stable YAML serializer) so the generated frontmatter is always safe to parse.src/services/interview/assess/grade-orchestrator.ts-194-209 (1)
194-209:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
candidate_idcan collide and overwrite same-day runsOn Line 194,
candidate_idis date-granular (<slug>-YYYY-MM-DD). Re-grading the same candidate on the same day will target the same default output directory and overwrite prior artifacts.Suggested fix
+function nowIdSuffix(): string { + return new Date().toISOString().replace(/[:.]/g, "-"); +} ... - candidate_id: `${slug(input.candidateName)}-${todayIso()}`, + candidate_id: `${slug(input.candidateName)}-${nowIdSuffix()}`,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/assess/grade-orchestrator.ts` around lines 194 - 209, The current candidate_id uses date granularity (slug(input.candidateName)-todayIso()) which causes same-day runs to collide; update the candidate_id generation in the result object to append a higher-resolution unique suffix (e.g., timestamp or a UUID/nanoid) so it becomes slug(input.candidateName)-todayIso()-<unique>, and keep the outputDir fallback logic (the outputDir variable that uses input.outputDir ?? join(..., result.candidate_id)) unchanged so that default directories are distinct per run; locate the candidate_id assignment and modify it to include the unique suffix (or call a helper like Date.now() or uuid()) so re-grades on the same day do not overwrite prior artifacts.
🟡 Minor comments (15)
src/cli/index.ts-192-206 (1)
192-206:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winBlock
interviewin thereportreserved-verb guard too.After adding the top-level
interviewverb here, the misuse guard inreport(Line 157) still only blocksdoctor/setup.teamhero report interviewnow slips through as report args instead of producing the helpful redirect error.Suggested fix
- const subcommands = ["doctor", "setup"]; + const subcommands = ["doctor", "setup", "interview"];🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cli/index.ts` around lines 192 - 206, The new top-level command declared via program.command("interview") is not included in the reserved-verb check inside the report misuse guard, so "teamhero report interview" bypasses the redirect; update the report command's misuse guard to include "interview" alongside "doctor" and "setup" (i.e., add "interview" to the reserved verbs list/check used by the report handler) so calls matching "interview" are redirected the same way; locate the guard logic in the report command and adjust its reservation check to include the "interview" verb (the new interview command is created with program.command("interview") and launches spawnTui).tests/unit/services/interview/bootstrap/project-validator.spec.ts-59-61 (1)
59-61:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove test-file line adjustment from fixture LOC calculation
The fixture includes 5 test-file lines in the
currentLocestimate, but the validator only counts source files (via the!TEST_NAME_PATTERNfilter). This causes the padding calculation to undershoot by 5 lines.- const currentLoc = o.deepModuleCount * 100 + (o.withFailingTests ? 5 : 0); + const currentLoc = o.deepModuleCount * 100;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/services/interview/bootstrap/project-validator.spec.ts` around lines 59 - 61, The test fixture currently adds 5 lines for test files to the LOC estimate (in the currentLoc calculation using o.deepModuleCount * 100 + (o.withFailingTests ? 5 : 0)), but the validator filters out test files via TEST_NAME_PATTERN and only counts source files, causing an undershoot; remove the test-file adjustment so currentLoc is computed from o.deepModuleCount only (e.g., drop the +(o.withFailingTests ? 5 : 0)) so padding/remaining calculations align with the validator's source-file counting.tui/interview_progress.go-87-92 (1)
87-92:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSet phase to running on
updateevents to ensure correct visual stateWhen an
updateevent arrives before astartevent, the phase incorrectly displays as pending (○ glyph) despite progress being reported. Update the"update"case to transition from pending to running.Proposed fix
case "update": + if m.status[evt.Step] == interviewPhasePending { + m.status[evt.Step] = interviewPhaseRunning + } // keep running, just record message for tooltip-style display if evt.Message != "" { m.messages[evt.Step] = evt.Message }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tui/interview_progress.go` around lines 87 - 92, The "update" event handler currently only records tooltip messages (m.messages[evt.Step]) but doesn't change the visual phase; modify the "update" case so that when an update arrives it sets m.phase to PhaseRunning (or at least transitions from PhasePending to PhaseRunning) so progress is shown as running; keep the existing message recording logic for evt.Step and evt.Message and ensure you reference the "update" case, m.phase, and the PhasePending/PhaseRunning constants when making the change.src/services/interview/assess/extractors/throughput.ts-13-13 (1)
13-13:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winTest-run detection is narrower than the rest of the assessment pipeline.
TEST_RUNcurrently misses common runners (e.g.,jest,vitest,cargo test,just test), so “Time to first test run” can be silently omitted for valid sessions.Proposed fix
-const TEST_RUN = /^\s*(bun|npm|yarn|pnpm)\s+(run\s+)?test\b|^\s*go\s+test\b|^\s*pytest\b/; +const TEST_RUN = + /^\s*((bun|npm|yarn|pnpm)\s+(run\s+)?test|go\s+test|pytest|jest|vitest|cargo\s+test|just\s+test)\b/;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/assess/extractors/throughput.ts` at line 13, TEST_RUN's regex is too narrow and misses many common test runners; update the TEST_RUN constant to include additional runners such as jest, vitest, mocha, ava, tap, cypress, playwright, cargo (cargo test), and just (just test) while preserving existing matches for package-manager-prefixed invocations (bun|npm|yarn|pnpm with optional "run"). Modify the regex used in src/services/interview/assess/extractors/throughput.ts (constant TEST_RUN) to match both direct runner commands and package-manager-prefixed forms, and consider making it case-insensitive or allowing optional whitespace so commands like "npm run test", "yarn jest", "cargo test", and "just test" are all detected.teamhero-interview-kit/INTERVIEW_RULES.md-41-43 (1)
41-43:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd language identifiers to fenced code blocks.
These fences trigger markdownlint MD040; adding
bashkeeps docs lint-clean and consistent.♻️ Proposed fix
- ``` + ```bash git push origin HEAD ``` ... -``` +```bash brew install asciinema...
-+bash
sudo apt install asciinema # or your distro's package manager... - ``` + ```bash sudo apt update sudo apt install asciinema git ```Also applies to: 50-52, 56-58, 68-71
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@teamhero-interview-kit/INTERVIEW_RULES.md` around lines 41 - 43, Update the Markdown fenced code blocks in INTERVIEW_RULES.md to include the bash language identifier: change ``` to ```bash for the snippets containing "git push origin HEAD", "brew install asciinema", "sudo apt install asciinema", and the block with "sudo apt update / sudo apt install asciinema git" (also apply to the other occurrences you noted at lines 50-52, 56-58, 68-71) so the file passes markdownlint MD040 and renders correctly.skills/teamhero-interview/SKILL.md-50-52 (1)
50-52:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSpecify language for CLI code fences.
Adding
bashto these fenced blocks resolves markdownlint MD040 warnings.♻️ Proposed fix
-``` +```bash teamhero interview bootstrap...
-+bash
teamhero interview bootstrap --headless
--role --stack --domain --feature ""
--time-box
--mode-project A|B
--mode-analysis ai-assisted|human-only
--mode-rubric default|custom|default+jd
--output-dir
[--jd-path ] [--custom-prompt ""] [--role-title "<title>"]... -``` +```bash teamhero interview grade --candidate "Jane Doe" --repo <url> \ [--transcript <file>] [--interviewer-notes <file>] \ [--session-recording-url <url>] [--session-platform zoom|teams|meet|other] \ [--session-date YYYY-MM-DD] [--output-dir <path>]...
-+bash
teamhero interview cohort --role [--order alphabetical|chronological]Also applies to: 63-72, 81-86, 99-101
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/teamhero-interview/SKILL.md` around lines 50 - 52, Add the language tag "bash" to each CLI code fence in SKILL.md so the fenced blocks containing commands like "teamhero interview bootstrap", "teamhero interview bootstrap --headless ...", "teamhero interview grade --candidate ...", and "teamhero interview cohort --role ..." start with ```bash instead of ``` to satisfy markdownlint MD040; update every similar CLI fence (including the other occurrences mentioned) consistently.tui/interview_bootstrap_form.go-261-267 (1)
261-267:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winTreat whitespace-only input as empty in
nonEmptyvalidation.Inputs like
" "currently pass validation and can propagate invalid wizard values.💡 Proposed fix
import ( "errors" "fmt" + "strings" @@ func nonEmpty(field string) func(string) error { return func(s string) error { - if s == "" { + if strings.TrimSpace(s) == "" { return fmt.Errorf("%s is required", field) } return nil } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tui/interview_bootstrap_form.go` around lines 261 - 267, The nonEmpty validator currently treats whitespace-only strings as valid; update the nonEmpty function (nonEmpty(field string) func(string) error) to trim whitespace (e.g., strings.TrimSpace) from the input before checking emptiness, returning fmt.Errorf("%s is required", field) when the trimmed value is empty; add the strings import if missing so the validator rejects inputs like " ".docs/2026-05-09-candidate-interview-grader-plan.md-273-273 (1)
273-273:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd language identifiers to fenced code blocks.
These unlabeled fences trigger
MD040and reduce syntax highlighting/readability in rendered docs.Suggested fix pattern
-``` +```text ... -``` +```Use
bash,markdown,typescript, ortextas appropriate per block.Also applies to: 296-296, 528-528, 541-541, 672-672, 681-681, 694-694
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/2026-05-09-candidate-interview-grader-plan.md` at line 273, Several fenced code blocks in docs/2026-05-09-candidate-interview-grader-plan.md are unlabeled and trigger MD040; edit each fenced block (lines referenced in the review: 273, 296, 528, 541, 672, 681, 694) and add an appropriate language identifier after the opening backticks (e.g., ```bash, ```markdown, ```typescript, or ```text) to enable syntax highlighting—scan the surrounding content to choose the correct language for each block and replace each opening "```" with "```<language>" while leaving the closing "```" unchanged.docs/2026-05-09-candidate-interview-grader-plan.md-533-535 (1)
533-535:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winThe list-roles/list-candidates status is internally inconsistent.
Line 533-535 labels these verbs as
v1.5, while Line 770+ says they are still deferred. Please align one source of truth so readers don’t misinterpret shipped scope.Also applies to: 770-775
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/2026-05-09-candidate-interview-grader-plan.md` around lines 533 - 535, The document shows inconsistent status for the commands teamhero interview list-roles and teamhero interview list-candidates (they're marked v1.5 in one place but listed as deferred elsewhere); pick the correct status and make both references consistent: update the v1.5 label if they are still deferred or move/remove them from the deferred section if they are shipped, editing the occurrences that mention teamhero interview list-roles and teamhero interview list-candidates so the "shipped" label and the deferred list (the later block that currently lists them) match exactly.scripts/manual-test-interview.sh-62-63 (1)
62-63:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDon’t silently swallow step failures in a smoke script.
Using
|| trueat these execution points can hide real regressions and make the run look successful when core flows failed. Keep the script interactive, but surface non-zero exits explicitly.Suggested adjustment
+run_step() { + if ! "$@"; then + local code=$? + echo "⚠ Command failed with exit code $code: $*" >&2 + echo " Continue only if this failure was expected." >&2 + fi +} + -"$BIN" interview bootstrap || true +run_step "$BIN" interview bootstrap ... -"$BIN" interview bootstrap \ +run_step "$BIN" interview bootstrap \ --headless --no-confirm \ --role manual-test-role --stack TypeScript --domain Manual \ --feature "Verify the headless path still works" \ --mode-project A --mode-analysis human-only --mode-rubric default \ - --output-dir "$TMPDIR/manual-test-role" || true + --output-dir "$TMPDIR/manual-test-role" ... -"$BIN" interview grade \ +run_step "$BIN" interview grade \ --candidate "Manual Test Candidate" \ --repo "https://example.com/fake-repo" \ - --output-dir "$TMPDIR/grade-output" || true + --output-dir "$TMPDIR/grade-output" ... -"$BIN" interview cohort --role manual-test-role || true +run_step "$BIN" interview cohort --role manual-test-roleAlso applies to: 68-73, 83-87, 99-99
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/manual-test-interview.sh` around lines 62 - 63, The script is swallowing errors by appending "|| true" to key steps (e.g. the "$BIN interview bootstrap" invocation and the other occurrences at the indicated ranges); remove those "|| true" suffixes and instead check the command exit status: after invoking "$BIN interview bootstrap" (and the other similar commands), capture the exit code and if non-zero print a clear error message and either exit the script or prompt the user to continue (interactive mode) so failures are surfaced explicitly; look for occurrences of "$BIN" invocations with "|| true" and replace them with explicit exit-code handling or an interactive confirmation flow.tui/interview_bootstrap_wizard.go-262-267 (1)
262-267:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winGuard against nil
Optionsbefore validation/run.If a launcher returns
Confirmed=truewithOptions=nil, this path can panic in validation or runner invocation. Add a defensive check and fail cleanly.Proposed fix
if !res.Confirmed { return 0 } opts := res.Options + if opts == nil { + fmt.Fprintln(stderr, "wizard returned no options") + return 1 + } if msg := ValidateBootstrapOptions(opts); msg != "" { fmt.Fprintln(stderr, msg) return 1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tui/interview_bootstrap_wizard.go` around lines 262 - 267, The code assumes res.Options is non-nil before calling ValidateBootstrapOptions and runner.Run; add a nil check for opts (res.Options) right after assigning opts := res.Options and before calling ValidateBootstrapOptions: if opts == nil { fmt.Fprintln(stderr, "no options provided"); return 1 }, so the function fails cleanly instead of panicking; keep using stderr/stdout and still call ValidateBootstrapOptions and runner.Run only when opts is non-nil.tui/interview_grade.go-32-39 (1)
32-39:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winHandle missing flag values before assignment in
ParseGradeFlags.When a flag is followed by another flag token, the parser currently treats that token as a value and returns a misleading later error. Add a guard so the parser returns the correct “requires a value” error immediately.
Proposed fix
if i+1 >= len(args) { return nil, fmt.Sprintf("flag %s requires a value", a) } val := args[i+1] + if strings.HasPrefix(val, "--") { + return nil, fmt.Sprintf("flag %s requires a value", a) + } switch a {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tui/interview_grade.go` around lines 32 - 39, In ParseGradeFlags: the current logic only checks bounds for args[i+1] but can treat a subsequent flag token as a value; update the guard inside the case handling ("--repo", "--candidate", "--transcript", "--interviewer-notes", "--session-recording-url", "--session-platform", "--session-date", "--output-dir", "--local-repo-path") to not only ensure i+1 < len(args) but also verify args[i+1] is not a flag (e.g., starts with "-"); if it is a flag, return the same "flag %s requires a value" error immediately instead of assigning val and letting later code fail. Locate this logic in ParseGradeFlags and add the extra check before setting val := args[i+1].tui/interview_cohort.go-24-37 (1)
24-37:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReject flag-like tokens as missing values in
ParseCohortFlags.
ParseCohortFlagscurrently accepts another flag token as a value (for example,--role --order ...) and then fails later with a misleading error. Validate the consumed value before assigning it.Proposed fix
case "--role", "--role-dir", "--order": if i+1 >= len(args) { return nil, fmt.Sprintf("flag %s requires a value", a) } val := args[i+1] + if strings.HasPrefix(val, "--") { + return nil, fmt.Sprintf("flag %s requires a value", a) + } switch a {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tui/interview_cohort.go` around lines 24 - 37, In ParseCohortFlags, the code currently assigns args[i+1] to opts.Role/RoleDir/Order even if that token is another flag (e.g. "--role --order"), causing confusing errors later; fix it by validating val before assignment: after val := args[i+1], check if strings.HasPrefix(val, "-") (or specifically "--") and if so return the same missing-value error (e.g. fmt.Sprintf("flag %s requires a value", a)); otherwise proceed to set opts.Role, opts.RoleDir or opts.Order and increment i. Ensure to import strings if needed.src/services/interview/cohort/cohort-summary.ts-42-42 (1)
42-42:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse POSIX path joining for markdown links
On Line 42,
join(entry, "summary.md")may produce\on Windows, which can render as broken/awkward links in markdown.Suggested fix
-import { join } from "node:path"; +import { join, posix } from "node:path"; ... - summaryPath: join(entry, "summary.md"), + summaryPath: posix.join(entry, "summary.md"),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/cohort/cohort-summary.ts` at line 42, The summaryPath construction uses path.join(entry, "summary.md") which will produce backslashes on Windows and break markdown links; change it to use POSIX joining instead (e.g., use path.posix.join or equivalent) when building summaryPath in cohort-summary.ts so the resulting link always contains forward slashes; update imports/use of the path module accordingly and ensure the variable name summaryPath still receives the new POSIX-joined value (reference symbols: summaryPath, join, entry).tui/interview_bootstrap.go-127-129 (1)
127-129:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winValidate that
--jd-pathactually exists in headless modeLines 127–129 only require a non-empty value. A typo/path miss will pass validation and later run without JD context.
Suggested fix
if opts.ModeRubric == "default+jd" && strings.TrimSpace(opts.JDPath) == "" { return "--mode-rubric 'default+jd' requires --jd-path" } + if opts.ModeRubric == "default+jd" { + info, err := os.Stat(opts.JDPath) + if err != nil || info.IsDir() { + return "--jd-path must point to an existing file" + } + } return ""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tui/interview_bootstrap.go` around lines 127 - 129, The validation in the condition checking opts.ModeRubric and opts.JDPath only ensures that --jd-path is not empty, but it should also verify that the specified path actually exists to prevent runtime errors in headless mode. Update the code to check if the file at opts.JDPath exists using appropriate file existence checking methods (e.g., os.Stat in Go) and return an error if the file does not exist alongside the current validation in the logic handling the mode-checking.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: c9c03249-2c4d-43c6-ac4c-46fe9bc90861
📒 Files selected for processing (74)
README.mddocs/2026-05-09-candidate-interview-grader-plan.mddocs/interview-classification-rationale.mddocs/interview-rubric.mdscripts/manual-test-interview.shscripts/run-interview-bootstrap.tsscripts/run-interview-cohort.tsscripts/run-interview-grade.tsscripts/run-tests.shskills/teamhero-interview/SKILL.mdsrc/cli/index.tssrc/services/interview/assess/ai-observer.tssrc/services/interview/assess/audit-writer.tssrc/services/interview/assess/collectors/asciinema.tssrc/services/interview/assess/collectors/git-history.tssrc/services/interview/assess/collectors/jsonl-log.tssrc/services/interview/assess/collectors/transcript.tssrc/services/interview/assess/extractors/risk-awareness.tssrc/services/interview/assess/extractors/test-pass.tssrc/services/interview/assess/extractors/throughput.tssrc/services/interview/assess/extractors/verification.tssrc/services/interview/assess/grade-orchestrator.tssrc/services/interview/assess/types.tssrc/services/interview/bootstrap/openai-generator-client.tssrc/services/interview/bootstrap/orchestrator.tssrc/services/interview/bootstrap/project-generator.tssrc/services/interview/bootstrap/project-validator.tssrc/services/interview/bootstrap/role-config.tssrc/services/interview/cohort/cohort-summary.tssrc/services/interview/shared/events.tssrc/services/interview/shared/rubric.tsteamhero-interview-kit/.claude/CLAUDE.mdteamhero-interview-kit/.claude/settings.jsonteamhero-interview-kit/INTERVIEW_RULES.mdteamhero-interview-kit/PRIVACY_RELEASE.mdteamhero-interview-kit/RUBRIC_OVERVIEW.mdteamhero-interview-kit/end.shteamhero-interview-kit/lib/privacy-gate.shteamhero-interview-kit/start.shtests/unit/cli/interview.spec.tstests/unit/services/interview/assess/ai-observer.spec.tstests/unit/services/interview/assess/audit-writer.spec.tstests/unit/services/interview/assess/collectors.spec.tstests/unit/services/interview/assess/extractors.spec.tstests/unit/services/interview/assess/grade-orchestrator.spec.tstests/unit/services/interview/bootstrap/orchestrator.spec.tstests/unit/services/interview/bootstrap/project-generator.spec.tstests/unit/services/interview/bootstrap/project-validator.spec.tstests/unit/services/interview/bootstrap/role-config.spec.tstests/unit/services/interview/cohort/cohort-summary.spec.tstests/unit/services/interview/cohort/skill.spec.tstests/unit/services/interview/kit/kit-smoke.spec.tstests/unit/services/interview/kit/privacy-gate.spec.tstests/unit/services/interview/shared/events.spec.tstests/unit/services/interview/shared/rubric.spec.tstui/interview.gotui/interview_bootstrap.gotui/interview_bootstrap_form.gotui/interview_bootstrap_test.gotui/interview_bootstrap_wizard.gotui/interview_bootstrap_wizard_test.gotui/interview_cohort.gotui/interview_cohort_test.gotui/interview_grade.gotui/interview_grade_test.gotui/interview_preview.gotui/interview_preview_test.gotui/interview_progress.gotui/interview_progress_test.gotui/interview_styles.gotui/interview_styles_test.gotui/interview_test.gotui/main.gotui/manual_test_script_test.go
Enables auto-review for PRs targeting main, which covers PRs opened from claude/* branches. CodeRabbit's schema has no source-branch filter, so this is the closest we can get to "auto-review Claude PRs." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'grade' verb implied scoring against a rubric, which conflicted with the project's observations-not-scores ethics floor. Renamed the CLI verb, file/directory names, type names, docs, and tests to use 'review' instead. Includes: - CLI: teamhero interview grade → teamhero interview review - Go TUI: tui/interview_grade*.go → tui/interview_review*.go, types GradeOptions/GradeRunner → ReviewOptions/ReviewRunner - TS service: src/services/interview/assess/ → review/, orchestrator renamed; GradeResult/GradeInput/GradeOutcome → Review* - Subprocess: scripts/run-interview-grade.ts → run-interview-review.ts - Tests, SKILL.md, README, manual test script, planning doc Also adds a defensive guard to the TS CLI runner: unknown subcommands now print an error and exit 1 instead of silently falling through to commander's top-level help. That silent fallthrough is what made a stale dist look like a missing-command bug when 'teamhero interview' first shipped here. No backward-compat shims — pre-deployment, no external callers. Closes teamhero-scripts-td7, teamhero-scripts-zva. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "skips all roadmap audit contexts when TEAMHERO_ROADMAP_AUDIT_ENABLED=0"
test failed under non-isolated `bun test` because other specs (ai-service,
individual-summarizer, report-service) install a `mock.module("src/lib/env.js")`
stub whose `getEnv` ignores `process.env`. `mock.module()` registrations leak
across files, so when this file ran after one of them, setting
`process.env.TEAMHERO_ROADMAP_AUDIT_ENABLED = "0"` had no effect on the gate.
Re-pin `getEnv` to a real implementation at the top of this spec so it works
regardless of which file ran before it. CI (per-file isolated runner) was
already green; this just fixes the fast `bun test` workflow.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 18
♻️ Duplicate comments (1)
src/services/interview/review/review-orchestrator.ts (1)
45-47:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winUse
execFileSync/arg array to remove command-injection risk.Line 45 builds a shell command with interpolated
repoUrl/destDir; this allows shell-breaking payloads. Use argument-array execution instead.Suggested fix
-import { execSync } from "node:child_process"; +import { execFileSync } from "node:child_process"; ... const defaultCloner: Cloner = (repoUrl, destDir) => { - execSync(`git clone --depth=50 '${repoUrl}' '${destDir}'`, { + execFileSync("git", ["clone", "--depth=50", repoUrl, destDir], { stdio: "inherit", }); };#!/bin/bash # Verify unsafe shell-interpolated exec usage in the reviewed file. rg -nP 'execSync\s*\(\s*`git clone' src/services/interview/review/review-orchestrator.ts rg -nP '\bexecSync\s*\(' src/services/interview/review/review-orchestrator.ts🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/review/review-orchestrator.ts` around lines 45 - 47, The git clone call in review-orchestrator.ts uses execSync with a shell-interpolated command (execSync(`git clone --depth=50 '${repoUrl}' '${destDir}'`, ...)) which is vulnerable to command injection; replace it with a non-shell, argument-array style call (e.g., execFileSync or spawnSync) passing ["git", "clone", "--depth=50", repoUrl, destDir] and preserve stdio: "inherit" semantics; ensure you remove the shell=true behavior, pass repoUrl and destDir as separate args, and adjust any error handling around the exec in the function/method that contains this execSync invocation.
🧹 Nitpick comments (1)
tests/unit/cli/unknown-subcommand.spec.ts (1)
75-100: ⚡ Quick winKnown-subcommand guard test is slightly environment-dependent.
Line 90 drives
run(..., sub, "--help")into TUI passthrough, so this “guard-stage” unit test can depend on local binary/runtime state. Consider isolating guard logic from TUI execution in this test path to keep it fully deterministic.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/cli/unknown-subcommand.spec.ts` around lines 75 - 100, The test is hitting TUI passthrough when calling run([... sub, "--help"]) making it environment-dependent; modify the test to isolate the guard by stubbing out the TUI execution in the deps returned by makeDeps (e.g., set deps.spawnTui to a no-op or a function that throws a deterministic sentinel) so run() exercises only the guard logic and never invokes the real TUI, keep the existing errorSpy/exitSpy handling and assertions around run and unknown-subcommand messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/2026-05-09-candidate-interview-reviewer-plan.md`:
- Line 273: The markdown contains unannotated fenced code blocks (``` ) that
trigger MD040; update each fenced block to include a language identifier (e.g.,
```bash, ```text, ```markdown) so tooling can correctly lint and highlight them.
Locate every triple-backtick fence in the document (all occurrences of ``` ) and
replace the opening fence with an appropriate language token consistent across
similar examples (use bash for shell commands, text for plain output, markdown
for nested markdown examples, etc.), and ensure the same convention is applied
to the other occurrences noted in the review.
- Line 757: Update the stale module path in the implementation progress table:
replace the incorrect `src/services/interview/{bootstrap,assess,cohort,shared}/`
reference with the correct module path that uses `review` (e.g.,
`src/services/interview/{bootstrap,assess,cohort,review}/`) so it matches the
rest of the document and the shipped flow; ensure the `teamhero interview` CLI
subcommand line and any adjacent references use the `review` segment
consistently.
In `@scripts/manual-test-interview.sh`:
- Around line 83-86: The review invocation using "$BIN" interview review
currently omits the analysis mode flag so it may call OpenAI; modify the command
calling "$BIN" interview review (the block with --candidate and --repo) to
include --mode-analysis human-only so the script enforces human-only analysis
and avoids any API usage/cost paths.
In `@scripts/run-interview-review.ts`:
- Around line 53-56: The CLI currently only checks that at least one of
flags.repo or flags.localRepoPath is provided, but allows both; change the check
to require exactly one (XOR). Replace the existing if-condition with a test that
fails when both are present or both absent (e.g. if ((!!flags.repo) ===
(!!flags.localRepoPath)) { ... }), and update the error message to state
"Provide exactly one of --repo <url> or --local-repo-path <dir>" before calling
process.exit(1); keep references to flags.repo and flags.localRepoPath so
reviewers can find the change.
In `@skills/teamhero-interview/SKILL.md`:
- Around line 78-86: The heading "teamhero interview review <repo-url>" is
inconsistent with the actual CLI invocation which uses flags; update the
SKILL.md heading to match the documented usage by changing it to something like
"teamhero interview review --repo <url>" (or show both forms: positional and
flag) and ensure the examples and paragraph describe the same form; edit the
heading text around the `teamhero interview review` section and any nearby
examples so they reference the `--repo <url>` flag (and other flags)
consistently with the code examples and the documented invocation.
- Around line 50-52: Add the missing fenced-code language identifiers by
changing the triple-backtick fences for the command snippets to specify "bash"
(e.g., the blocks containing "teamhero interview bootstrap", "teamhero interview
bootstrap --headless \", "teamhero interview review --candidate \"Jane Doe\"
--repo <url> \", "teamhero interview cohort --role <slug> [--order
alphabetical|chronological]" and the other similar command blocks at lines
referenced) so each code fence begins with ```bash to satisfy markdownlint
MD040; update every occurrence noted (around lines 50-52, 63-72, 81-86, 99-101)
consistently.
In `@src/cli/index.ts`:
- Around line 229-255: The nested-subcommand rejection logic was left using a
hardcoded allowlist (e.g., only "doctor"/"setup") so "teamhero report interview"
bypasses the guard; update the nested-subcommand check to reuse the shared
KNOWN_SUBCOMMANDS constant instead of the ad-hoc array: locate the guard in the
run function that inspects the first/second args and replace the hardcoded
allowed-subcommands with KNOWN_SUBCOMMANDS (so validations for
first/second-level subcommands remain consistent).
In `@src/services/interview/review/ai-observer.ts`:
- Around line 192-201: The current rejectIfScored() string-scans JSON which
false-positives when forbidden names appear inside string values; instead
parse/inspect the response structurally and recursively check object keys
against FORBIDDEN_FIELDS. Update rejectIfScored to: safely parse or coerce the
incoming response to an object (handle non-objects by returning), walk nested
objects and arrays, and if any object has a property name equal to one of
FORBIDDEN_FIELDS (use direct key comparison, not regex), throw the existing
Error message; also catch and rethrow/handle JSON parse errors consistently so
you don't silently ignore malformed input.
In `@src/services/interview/review/audit-writer.ts`:
- Around line 53-76: The yaml function is vulnerable because it interpolates
user-adjacent strings (AuditFrontmatter fields like candidate, role, tags,
recommendation, session_* and session URLs) directly into YAML, allowing invalid
frontmatter or injected keys; fix by switching from manual string interpolation
to safe YAML serialization (e.g., use a YAML library like js-yaml oryaml and
call safeDump/safeLoad for the AuditFrontmatter object) or, if you must
hand-roll it, ensure every scalar is properly quoted and escaped (escape
backslashes, quotes and newlines, emit tags as a YAML sequence not an unquoted
bracket list, and only emit optional keys when present). Update the yaml
function to construct a plain JS object from AuditFrontmatter and pass it to the
serializer (or replace each push with properly quoted/escaped values and tags
emitted as a sequence) so all fields are safely encoded.
In `@src/services/interview/review/collectors/asciinema.ts`:
- Around line 67-86: The code in the event-processing block for evt.data assumes
a single-character payload and misses cases where evt.data contains multiple
characters (e.g., "npm test\r"); update the logic in the handler around
variables buffer, lastKeyDelta, evt.delta, baseEpoch and commands so you iterate
over each character in evt.data (for example for (const ch of evt.data)) and
apply the same branches: treat '\r' and '\n' as Enter (trim buffer, push a
command with timestamp = isoFromUnix(baseEpoch + evt.delta) and
pauseSecondsBeforeEnter = Math.max(0, evt.delta - lastKeyDelta)), treat '�' and
'\b' as backspace (slice buffer and set lastKeyDelta), treat printable chars
(charCode >= 32 or '\t') as append to buffer and set lastKeyDelta; ensure buffer
is reset after pushing a command so subsequent characters are processed
correctly.
In `@src/services/interview/review/collectors/git-history.ts`:
- Around line 14-18: The realRunner implementation currently builds a shell
command string and calls execSync (risking injection/quoting bugs); change it to
call execFileSync with the program "git" and pass the args array directly (i.e.
execFileSync("git", args, { cwd, encoding: "utf8" })) so you remove the manual
quoting/map and avoid shell parsing; update any type/return handling in the
GitRunner wrapper accordingly.
In `@src/services/interview/review/collectors/transcript.ts`:
- Around line 27-37: Validate the sessionStartIso at the start of toIsoFromHMS:
parse it (e.g., new Date(sessionStartIso)) and check
isFinite(date.getTime())/isNaN before using getTime(); if invalid, throw a clear
Error mentioning "invalid sessionStartIso" and include the raw value. Update the
same validation pattern where sessionStartIso is used later in this file (the
second occurrence around lines 51-52) so both code paths fail fast with the same
clear error message; reference function toIsoFromHMS and the other usage site to
locate and mirror the change.
In `@src/services/interview/review/extractors/test-pass.ts`:
- Around line 20-34: The spawnSync invocations in test-pass.ts that run the Go
and Bun tests (the calls to spawnSync("go", ["test", "./..."], { cwd: repoDir,
encoding: "utf8" }) and spawnSync("bun", ["test"], { cwd: repoDir, encoding:
"utf8" })) lack timeout and maxBuffer safeguards; update both calls to include
sensible timeout and maxBuffer options (e.g., timeout: 120000, maxBuffer: 10 *
1024 * 1024) so the subprocesses cannot hang or overflow stdout/stderr, keep the
rest of the options (cwd, encoding) and ensure the rest of the code that reads
r.stdout/r.stderr and computes passed/failed via countMatches remains unchanged.
In `@src/services/interview/review/extractors/verification.ts`:
- Around line 58-71: The interleaving count currently treats any adjacent
prompt→command pair in the merged array as "within 30 seconds" but doesn't
enforce the 30s boundary; update the loop that computes interleavedAfterPrompt
(which iterates merged, uses cur/prev and checks cur.type === "command" &&
prev.type === "prompt") to also compute the timestamp difference (e.g., parse
cur.timestamp and prev.timestamp to milliseconds via Date) and only increment
interleavedAfterPrompt when the difference is <= 30000 ms; keep the existing
sort by timestamp and ensure you handle timestamp parsing robustly (use
getTime() or valueOf()) so only prompt→test pairs within the 30-second window
are counted.
In `@src/services/interview/review/review-orchestrator.ts`:
- Around line 83-85: Replace the ad-hoc date logic: remove todayIso() and any
manual string concatenation like `${sessionDate}T09:00:00Z`; instead import and
use the date-utils helpers (resolveStartISO, resolveEndISO, resolveEndEpochMs,
formatDateUTC) so timezone-safe boundaries and display formatting are
consistent; update the places referenced (the todayIso function and the
constructions around sessionDate at the other locations) to call
resolveStartISO/resolveEndISO for interval boundaries and formatDateUTC for any
display strings or ISO formatting.
- Around line 39-42: The Cloner and ReviewDependencies types are port interfaces
and must be moved into the shared core types file; remove their local
declarations (e.g., the Cloner type and the ReviewDependencies type block) and
declare them in src/core/types.ts instead, then import and use those types in
review-orchestrator.ts (and any other files that used the local definitions) so
all boundary port interfaces live in the central types module and usages
reference the imported Cloner and ReviewDependencies symbols.
In `@tests/unit/services/interview/cohort/cohort-summary.spec.ts`:
- Around line 96-109: The test currently asserts status words only; instead,
assert the actual glyph characters so the UI signal can't be removed silently:
update the test using the existing helpers (makeCandidate, loadCohort,
renderCohortSummary) to check for the exact glyphs "✅" for reviewed Alice and
"⏳" for pending Bob (e.g. expect(body).toMatch(/Alice.*✅/) and
expect(body).toMatch(/Bob.*⏳/)), while retaining the existing checks for
"Reviewed"/"Pending" and "Hire" to keep coverage of both glyph and text.
In `@tests/unit/services/interview/review/ai-observer.spec.ts`:
- Around line 123-132: The test currently only checks
OBSERVATION_RESPONSE_SCHEMA.additionalProperties at the top level; add a
recursive schema traversal (e.g., a helper named assertNoAdditionalProperties or
traverseSchema) that walks OBSERVATION_RESPONSE_SCHEMA and asserts
additionalProperties === false for every object schema node it encounters
(inspect properties, patternProperties, additionalProperties, items,
allOf/anyOf/oneOf, not, definitions/$defs, etc.), and also ensure the banned
field names ("score","weighted_total","raw_total","band","signal_count") are not
present anywhere by checking keys at each visited node; call this helper from
the "declares strict json_schema..." test to harden the assertion.
---
Duplicate comments:
In `@src/services/interview/review/review-orchestrator.ts`:
- Around line 45-47: The git clone call in review-orchestrator.ts uses execSync
with a shell-interpolated command (execSync(`git clone --depth=50 '${repoUrl}'
'${destDir}'`, ...)) which is vulnerable to command injection; replace it with a
non-shell, argument-array style call (e.g., execFileSync or spawnSync) passing
["git", "clone", "--depth=50", repoUrl, destDir] and preserve stdio: "inherit"
semantics; ensure you remove the shell=true behavior, pass repoUrl and destDir
as separate args, and adjust any error handling around the exec in the
function/method that contains this execSync invocation.
---
Nitpick comments:
In `@tests/unit/cli/unknown-subcommand.spec.ts`:
- Around line 75-100: The test is hitting TUI passthrough when calling run([...
sub, "--help"]) making it environment-dependent; modify the test to isolate the
guard by stubbing out the TUI execution in the deps returned by makeDeps (e.g.,
set deps.spawnTui to a no-op or a function that throws a deterministic sentinel)
so run() exercises only the guard logic and never invokes the real TUI, keep the
existing errorSpy/exitSpy handling and assertions around run and
unknown-subcommand messages.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: f1afff1b-ff4d-4824-b702-affafb49c191
📒 Files selected for processing (37)
.coderabbit.yamlREADME.mddocs/2026-05-09-candidate-interview-reviewer-plan.mdscripts/manual-test-interview.shscripts/run-interview-review.tsskills/teamhero-interview/SKILL.mdsrc/cli/index.tssrc/services/interview/bootstrap/openai-generator-client.tssrc/services/interview/review/ai-observer.tssrc/services/interview/review/audit-writer.tssrc/services/interview/review/collectors/asciinema.tssrc/services/interview/review/collectors/git-history.tssrc/services/interview/review/collectors/jsonl-log.tssrc/services/interview/review/collectors/transcript.tssrc/services/interview/review/extractors/risk-awareness.tssrc/services/interview/review/extractors/test-pass.tssrc/services/interview/review/extractors/throughput.tssrc/services/interview/review/extractors/verification.tssrc/services/interview/review/review-orchestrator.tssrc/services/interview/review/types.tstests/unit/cli/unknown-subcommand.spec.tstests/unit/services/discrepancy.spec.tstests/unit/services/interview/cohort/cohort-summary.spec.tstests/unit/services/interview/cohort/skill.spec.tstests/unit/services/interview/review/ai-observer.spec.tstests/unit/services/interview/review/audit-writer.spec.tstests/unit/services/interview/review/collectors.spec.tstests/unit/services/interview/review/extractors.spec.tstests/unit/services/interview/review/review-orchestrator.spec.tstui/interview.gotui/interview_progress.gotui/interview_review.gotui/interview_review_test.gotui/interview_styles.gotui/interview_test.gotui/main.gotui/manual_test_script_test.go
✅ Files skipped from review due to trivial changes (3)
- .coderabbit.yaml
- README.md
- tests/unit/services/interview/review/extractors.spec.ts
🚧 Files skipped from review as they are similar to previous changes (4)
- tui/main.go
- tests/unit/services/interview/cohort/skill.spec.ts
- tui/interview_styles.go
- tui/interview_progress.go
Security and data-loss:
- review-orchestrator: replace shell-interpolated `git clone` with
execFileSync so a candidate repo URL containing single quotes or
shell metacharacters cannot inject commands.
- git-history collector: same — switch execSync(`git ${args}`) to
execFileSync("git", args).
- project-generator.writeGenerated: refuse LLM-supplied file paths
that are absolute or that resolve outside outputDir (path traversal
via `../` or absolute paths in the generator response).
- project-generator.clearOutputDir: refuse to recursively remove the
filesystem root, the user's $HOME, or any root-level path. A
misconfigured outputDir of "/" or "$HOME" otherwise wipes user data.
- review-orchestrator: candidate_id now includes a UTC HHMMSS suffix
so two reviews of the same candidate on the same day land in
distinct directories instead of silently overwriting each other.
Trust-boundary validation:
- role-config.readRoleConfig: parse failures and shape mismatches now
throw a descriptive error instead of letting malformed JSON
propagate as untyped data into the orchestrator.
- cohort-summary.loadCohort: validate that audit.json has the
required frontmatter shape; skip malformed records instead of
crashing renderRow with `undefined`.
- audit-writer.yaml: quote scalar values that contain colons, hashes,
newlines, etc. A candidate name with `:` or `#` previously corrupted
the YAML frontmatter, breaking the audit-writer ↔ cohort-summary
round-trip.
- shared/events.parseInterviewEvent: validate `step` / `status` /
`message` / `progress` fields on `progress` events instead of
trusting any object whose `type` field is "progress".
- run-interview-bootstrap: validate `--time-box` is a finite number
and `--mode-*` flags are from the allowed enum before forwarding
into RoleConfig; previously NaN/garbage leaked through and failed
with confusing downstream messages.
- TUI bootstrap: when `--mode-rubric=default+jd`, also check that the
`--jd-path` file actually exists, not just that it's non-empty.
Correctness / hang / data-loss in collectors:
- jsonl-log: gracefully return empty when interview.log is missing
instead of throwing ENOENT and failing the whole review run.
- transcript: broaden the bare-speaker regex to accept "Speaker 1"
style labels; fall back to the epoch if sessionStartIso parses
to NaN instead of crashing toISOString().
- verification: actually apply the documented 30s prompt-to-test
window using timestamp deltas, not stream adjacency.
- test-pass: cap candidate test runs with a 5-minute SIGTERM timeout
and surface r.error (ETIMEDOUT, ENOENT, …) in the output so a hung
candidate process can't block the grader indefinitely.
- project-validator.walk: use lstatSync and skip symlinks so an
attacker-controlled generator output with a cyclic symlink can't
hang the validator.
Error contracts and TUI:
- bootstrap/orchestrator: wrap generation + writeRoleConfig in
try/catch so unexpected throws surface as a structured
RunBootstrapResult failure instead of an unhandled rejection at
the CLI boundary.
- run-interview-bootstrap / run-interview-review: drop dotenv
`{override: true}` so CI-provided env vars (real OPENAI_API_KEY)
win over anything in a local .env.
- TUI wizard dispatcher: TTY-check stdin before launching the
interactive huh.Form (no flags + piped stdin previously hung
forever); add a nil-Options guard before the runner call.
- TUI progress model: an "update" event now promotes the phase from
pending → running so the UI reflects activity even when an emitter
skips the matching "start".
- TUI bootstrap form `nonEmpty`: trim whitespace so a value of
spaces-only is rejected rather than treated as set.
Hygiene:
- privacy-gate.sh: move `set -u` inside the direct-exec guard so
sourcing the library doesn't leak nounset into the caller's shell.
- start.sh: write .interview-state/started-at AFTER the asciinema
preflight succeeds, so a failed start doesn't leave a stale marker
that end.sh would mistake for a completed session.
Test fixes:
- cohort-summary: import AuditFrontmatter from the renamed `review/`
module (was still pointing at the deleted `assess/` path; tsc
caught it).
- project-validator spec fixture: stop adding +5 to the expected LOC
for failing-test files — test files are excluded from the LOC count
by the validator's TEST_NAME_PATTERN.
- audit-writer + review-orchestrator specs: expect quoted YAML URLs
now that the YAML emitter quotes scalars containing colons.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it. Generating unit tests... This may take up to 20 minutes. |
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
src/services/interview/bootstrap/project-generator.ts (1)
95-99:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winTop-level delete guard is ineffective for paths like
/usrand/etc.Line 98 checks
parent === abs, which only catches filesystem roots. It does not block top-level directories, soclearOutputDir()can still recursively delete unsafe system paths if misconfigured.🔧 Suggested fix
-import { dirname, isAbsolute, join, relative, resolve, sep } from "node:path"; +import { dirname, isAbsolute, join, parse, relative, resolve, sep } from "node:path"; ... function assertSafeToClear(outputDir: string): void { const abs = resolve(outputDir); - const root = resolve(abs, "/"); + const root = parse(abs).root; if (abs === root || abs === sep) { throw new Error(`Refusing to clear filesystem root: ${abs}`); } const home = homedir(); if (home && abs === resolve(home)) { throw new Error(`Refusing to clear home directory: ${abs}`); } - // Refuse a top-level path like /usr, /etc, /home — anything where the - // path has no parent beyond the root. - const parent = dirname(abs); - if (parent === abs) { - throw new Error(`Refusing to clear root-level path: ${abs}`); + // Refuse top-level paths like /usr, /etc, /home. + if (dirname(abs) === root) { + throw new Error(`Refusing to clear top-level path: ${abs}`); } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/bootstrap/project-generator.ts` around lines 95 - 99, The current check in clearOutputDir uses dirname(abs) and only throws when parent === abs, which misses top-level children like "/usr" (whose parent is "/" not equal to abs); update the guard to also detect paths directly under the filesystem root by comparing parent to the path root: require path.parse(abs).root and throw if parent === abs || parent === path.parse(abs).root; reference the clearOutputDir function and the local variables abs and parent and use path.parse(abs).root to identify root-level parents.
🧹 Nitpick comments (1)
src/services/interview/review/extractors/test-pass.ts (1)
14-17: ⚡ Quick winMove
TestRunnerport type tosrc/core/types.ts.This defines a new port-like interface locally; per repo architecture rules, keep port interfaces centralized in core types and import it here.
As per coding guidelines, "ALL port interfaces must be defined in
src/core/types.ts— never create new types files".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/interview/review/extractors/test-pass.ts` around lines 14 - 17, The type alias TestRunner is a port interface that must be centralized instead of being defined locally; remove the local export of TestRunner and instead add the identical exported type declaration to the core types module (types.ts), then import { TestRunner } into this file and use it in place of the removed local type; ensure the signature stays { readonly passed: number; readonly failed: number; readonly output: string } and that the core types module exports it for other modules to consume.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/services/interview/bootstrap/project-generator.ts`:
- Around line 27-33: Move the GeneratorClient interface declaration into the
shared core types module (the central core types file) so it lives with other
port interfaces; export it from there and remove the local declaration in
project-generator.ts. Update all files that reference GeneratorClient (and
ensure GeneratedProject and RoleConfig are imported or re-exported from the core
types file) to import the interface from the shared core types module instead of
the local service file. Ensure the generate method signature and readonly
modifiers remain identical when relocating.
- Around line 109-120: The copyDir function currently uses statSync(s) which
follows symlinks; change it to use lstatSync(s) and explicitly skip entries
where lstatSync(s).isSymbolicLink() to avoid copying files outside
kitTemplateDir or entering symlink loops. In function copyDir, replace statSync
with lstatSync, add a guard like "if (st.isSymbolicLink()) continue", and keep
the existing directory handling (mkdirSync + recursive copy) and file writing
(mkdirSync(dirname(d)) + writeFileSync) for non-symlink entries.
In `@src/services/interview/review/audit-writer.ts`:
- Around line 57-68: The yamlScalar function detects values needing quoting but
only escapes backslashes and double quotes, so control characters like newline,
carriage return and tab remain raw and produce invalid YAML; update yamlScalar
to additionally replace control characters (e.g. \n, \r, \t, \b, \f, and any
other non-printable controls you expect) with their corresponding escape
sequences before wrapping in quotes (while still escaping backslashes and double
quotes), ensuring the returned double-quoted scalar contains valid YAML escape
sequences; modify the replacement logic used when building the escaped variable
in yamlScalar to run a single pass that maps those control characters to "\\n",
"\\r", "\\t", "\\b", "\\f", etc., then return the quoted string.
- Around line 70-76: yamlTag currently detects whitespace (including
newlines/tabs) but only escapes backslashes and double quotes; update yamlTag to
additionally replace control whitespace characters with proper escape sequences
before wrapping in quotes: after computing escaped = value.replace(/\\/g,
"\\\\").replace(/"/g, '\\"'), also replace newline (\n) with "\\n", carriage
return (\r) with "\\r", and tab (\t) with "\\t" (and any other control chars you
deem necessary) so the returned quoted string is fully escaped; keep the
existing regex and return logic but perform these extra replacements on the
escaped variable inside the yamlTag function.
In `@src/services/interview/review/collectors/git-history.ts`:
- Around line 27-31: The current try/catch around runner(["log",
`--format=${FORMAT}`, "--numstat"], repoDir) swallows all errors and returns [];
change it to only convert known benign git cases (e.g., "does not have any
commits", "unknown revision", or specific exit codes from git that indicate
empty history) into [], and rethrow any other exceptions so real failures
(permissions, broken git, unexpected runtime errors) surface; specifically wrap
the call to runner in a try, inspect the thrown error's message/exit code/stderr
and return [] only for the recognized benign messages, otherwise throw the
original error so higher-level handlers (in-report placeholder / appendix / CLI
stderr) can handle it.
---
Duplicate comments:
In `@src/services/interview/bootstrap/project-generator.ts`:
- Around line 95-99: The current check in clearOutputDir uses dirname(abs) and
only throws when parent === abs, which misses top-level children like "/usr"
(whose parent is "/" not equal to abs); update the guard to also detect paths
directly under the filesystem root by comparing parent to the path root: require
path.parse(abs).root and throw if parent === abs || parent ===
path.parse(abs).root; reference the clearOutputDir function and the local
variables abs and parent and use path.parse(abs).root to identify root-level
parents.
---
Nitpick comments:
In `@src/services/interview/review/extractors/test-pass.ts`:
- Around line 14-17: The type alias TestRunner is a port interface that must be
centralized instead of being defined locally; remove the local export of
TestRunner and instead add the identical exported type declaration to the core
types module (types.ts), then import { TestRunner } into this file and use it in
place of the removed local type; ensure the signature stays { readonly passed:
number; readonly failed: number; readonly output: string } and that the core
types module exports it for other modules to consume.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 20c0ad49-c16c-47ab-bd28-dd3a3617b024
📒 Files selected for processing (25)
scripts/run-interview-bootstrap.tsscripts/run-interview-review.tssrc/services/interview/bootstrap/orchestrator.tssrc/services/interview/bootstrap/project-generator.tssrc/services/interview/bootstrap/project-validator.tssrc/services/interview/bootstrap/role-config.tssrc/services/interview/cohort/cohort-summary.tssrc/services/interview/review/audit-writer.tssrc/services/interview/review/collectors/git-history.tssrc/services/interview/review/collectors/jsonl-log.tssrc/services/interview/review/collectors/transcript.tssrc/services/interview/review/extractors/test-pass.tssrc/services/interview/review/extractors/verification.tssrc/services/interview/review/review-orchestrator.tssrc/services/interview/shared/events.tsteamhero-interview-kit/lib/privacy-gate.shteamhero-interview-kit/start.shtests/unit/services/interview/bootstrap/project-validator.spec.tstests/unit/services/interview/review/audit-writer.spec.tstests/unit/services/interview/review/review-orchestrator.spec.tstui/interview_bootstrap.gotui/interview_bootstrap_form.gotui/interview_bootstrap_wizard.gotui/interview_bootstrap_wizard_test.gotui/interview_progress.go
🚧 Files skipped from review as they are similar to previous changes (17)
- src/services/interview/shared/events.ts
- src/services/interview/review/collectors/jsonl-log.ts
- src/services/interview/review/collectors/transcript.ts
- tests/unit/services/interview/bootstrap/project-validator.spec.ts
- scripts/run-interview-review.ts
- teamhero-interview-kit/lib/privacy-gate.sh
- tests/unit/services/interview/review/review-orchestrator.spec.ts
- tests/unit/services/interview/review/audit-writer.spec.ts
- tui/interview_progress.go
- tui/interview_bootstrap_wizard.go
- src/services/interview/bootstrap/role-config.ts
- tui/interview_bootstrap.go
- src/services/interview/cohort/cohort-summary.ts
- tui/interview_bootstrap_form.go
- tui/interview_bootstrap_wizard_test.go
- src/services/interview/bootstrap/project-validator.ts
- src/services/interview/review/review-orchestrator.ts
Without `[args...]` on the command spec, commander rejected any positional argument as "too many arguments" — so `teamhero interview bootstrap`, `teamhero report --since ...`, etc. all failed before reaching the Go TUI. The `.allowUnknownOption()` flag only covers `--flags`, not positionals. Declaring each command as `command [args...]` makes commander accept variadic positionals which we then forward verbatim to the TUI binary. No call-site changes needed; the action callbacks already re-derive args from process.argv. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
✅ Unit tests committed locally. Commit: |
- audit-writer: escape literal newlines, carriage returns, and tabs when emitting double-quoted YAML scalars. A candidate name with an embedded newline previously made the frontmatter invalid YAML even with the surrounding quotes. - cli/index.ts: add "interview" to the report-action's nested- subcommand guard so `teamhero report interview` is rejected consistently with `report doctor` / `report setup`, instead of silently routing into the report flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts (3)
156-173: ⚡ Quick winChronological test does not actually prove
session_dateprecedence.Right now
dateandsession_dateare identical for both candidates, so sorting bydatewould also pass. Make them intentionally conflict so this test catches regressions.Suggested test hardening
makeCandidate(dir, "carol", { candidate: "Carol", - date: "2026-05-20", + date: "2026-04-01", session_date: "2026-05-20", }); makeCandidate(dir, "alice", { candidate: "Alice", - date: "2026-05-01", + date: "2026-06-30", session_date: "2026-05-01", });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts` around lines 156 - 173, The test "orders chronologically using session_date when available" currently uses identical date and session_date values so it doesn't prove session_date precedence; update the test data created via makeCandidate for "carol" and "alice" so their date and session_date intentionally conflict (e.g., set Carol.date before Alice.date but Carol.session_date after Alice.session_date) so renderCohortSummary(..., {order: "chronological"}) must rely on session_date to produce Alice before Carol; keep using loadCohort and the same expectation on body.indexOf.
132-135: ⚡ Quick winBody-wide negative date assertion is brittle.
expect(body).not.toContain("2026-05-10")can fail if that value appears elsewhere (metadata/footer) even when the Interviewed column is correct. Scope the check to Alice’s row.Suggested assertion narrowing
const body = renderCohortSummary("senior-backend", loadCohort(dir)); - expect(body).toContain("2026-06-01"); - // The submission date should NOT appear in place of the session date - expect(body).not.toContain("2026-05-10"); + const aliceRow = body.split("\n").find((line) => line.includes("Alice")) ?? ""; + expect(aliceRow).toContain("2026-06-01"); + expect(aliceRow).not.toContain("2026-05-10");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts` around lines 132 - 135, The negative date assertion is too broad: instead of using expect(body).not.toContain("2026-05-10"), narrow the check to Alice's table row by locating the row that contains "Alice" (e.g., parse the HTML in body and select the tr or nearest container with text "Alice") and assert that that specific row does not contain "2026-05-10"; update the test to query the Alice row from body and replace the global not.toContain check with an assertion against that row's text content.
207-210: ⚡ Quick winThe decimal-number assertion can create false failures.
expect(body).not.toMatch(/\d+\.\d+/)is broader than the requirement and could fail on unrelated decimals. Header assertions already validate the column contract well.Suggested cleanup
expect(body).not.toMatch(/\|\s*Score\s*\|/i); expect(body).not.toMatch(/\|\s*Total\s*\|/i); expect(body).not.toMatch(/\|\s*Rank\s*\|/i); - expect(body).not.toMatch(/\d+\.\d+/); // no decimal numbers (scores)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts` around lines 207 - 210, The test contains an overly broad negative assertion expect(body).not.toMatch(/\d+\.\d+/) which can fail on unrelated decimals; remove that assertion from cohort-summary-edge-cases.spec.ts (the expect(body).not.toMatch(/\d+\.\d+/) line) or, if you want to keep a decimal-specific check, replace it with a narrow pattern that matches decimal values inside table cells (e.g., a cell-boundary regex like /\|\s*\d+\.\d+\s*\|/) so only score-like decimals are asserted against; keep the existing header assertions as they already validate the column contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/unit/services/interview/bootstrap/project-generator-security.spec.ts`:
- Around line 109-125: The test expects generateProject(role({ outputDir: dir
}), clientFor(malicious)) to reject on paths containing a NUL byte, so add an
explicit NUL-byte validation in the path-resolution logic inside generateProject
(the code path that processes GeneratedProject.files -> file.path) to detect
"\0" (or char code 0) and throw a specific, named error (e.g., InvalidPathError
or a new PathValidationError) before any FS operations; then update the test to
assert that generateProject rejects with that specific error type/message
instead of a generic toThrow(), referencing generateProject, GeneratedProject,
and the file.path handling code for location.
In `@tests/unit/services/interview/bootstrap/role-config-edge-cases.spec.ts`:
- Around line 199-213: The test "stores optional jdPath and customPrompt when
present" only sets and asserts jdPath but not customPrompt; update the test to
actually include a customPrompt in the RoleConfig passed to writeRoleConfig and
add an assertion that readRoleConfig(dir)?.customPrompt equals that value (or
alternatively rename the test to only mention jdPath). Locate the test block in
role-config-edge-cases.spec.ts and modify the cfg object creation (RoleConfig)
to include customPrompt and add an expect(...) assertion after reading via
readRoleConfig(dir) to verify persistence, or change the test description to
match the current assertions.
---
Nitpick comments:
In `@tests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.ts`:
- Around line 156-173: The test "orders chronologically using session_date when
available" currently uses identical date and session_date values so it doesn't
prove session_date precedence; update the test data created via makeCandidate
for "carol" and "alice" so their date and session_date intentionally conflict
(e.g., set Carol.date before Alice.date but Carol.session_date after
Alice.session_date) so renderCohortSummary(..., {order: "chronological"}) must
rely on session_date to produce Alice before Carol; keep using loadCohort and
the same expectation on body.indexOf.
- Around line 132-135: The negative date assertion is too broad: instead of
using expect(body).not.toContain("2026-05-10"), narrow the check to Alice's
table row by locating the row that contains "Alice" (e.g., parse the HTML in
body and select the tr or nearest container with text "Alice") and assert that
that specific row does not contain "2026-05-10"; update the test to query the
Alice row from body and replace the global not.toContain check with an assertion
against that row's text content.
- Around line 207-210: The test contains an overly broad negative assertion
expect(body).not.toMatch(/\d+\.\d+/) which can fail on unrelated decimals;
remove that assertion from cohort-summary-edge-cases.spec.ts (the
expect(body).not.toMatch(/\d+\.\d+/) line) or, if you want to keep a
decimal-specific check, replace it with a narrow pattern that matches decimal
values inside table cells (e.g., a cell-boundary regex like
/\|\s*\d+\.\d+\s*\|/) so only score-like decimals are asserted against; keep the
existing header assertions as they already validate the column contract.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 04c43fb4-79c1-440c-a31f-06f8b7db413e
📒 Files selected for processing (7)
src/cli/index.tstests/unit/services/interview/bootstrap/openai-generator-client.spec.tstests/unit/services/interview/bootstrap/project-generator-security.spec.tstests/unit/services/interview/bootstrap/role-config-edge-cases.spec.tstests/unit/services/interview/cohort/cohort-summary-edge-cases.spec.tstests/unit/services/interview/review/asciinema-edge-cases.spec.tstests/unit/services/interview/review/audit-writer-edge-cases.spec.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/cli/index.ts
CodeRabbit-authored test suite verified that LLM-supplied paths containing "\0" can otherwise truncate path checks in some syscalls (the classic "evil.png\0.sh" trick). Reject the byte outright in resolveWithinRoot before any further path math. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The interview bootstrap wizard previously ran as a sequential loop of huh.NewForm(...).Run() calls. That produced a bare form on a blank screen with no orientation — no //// TEAM HERO header, no right-side configuration summary, no navigation hints. It didn't look like part of the same product as the report wizard. This commit rebuilds it as a single bubbletea program (interviewBootstrapTeaModel) so it picks up the same shell-header + summary-panel + hints layout as wizard.go. The data container (bootstrapWizardModel) and per-screen validators are unchanged; only the runner is swapped: runHuhBootstrapWizard → runBootstrapTeaWizard → tea.NewProgram(...).Run() Layout: - renderShellHeader at the top — same `//// TEAM HERO` gradient slashes - Left ~60% column: the active step's huh form - Right ~40% column: a bordered "Interview Bootstrap" summary panel listing every field. Filled values render bright, the active step's label is highlighted cyan, and not-yet-reached fields show "—" - Footer hint line: `enter continue • ctrl+c quit` Testing: - Adds teatest + x/exp/golden as direct Go test deps (the latter was already transitively present) - New layout tests in tui/interview_bootstrap_tea_test.go drive the model through teatest and assert the three layout invariants (header / summary / hints) are present at multiple wizard states - TestInterviewBootstrap_Screenshot_WritesGolden produces three human-reviewable plaintext screenshots under tui/testdata/interview_bootstrap/ — the test compares against them on every run, so layout regressions show up as diffs in PRs. Run with TEAMHERO_UPDATE_SCREENSHOTS=1 to refresh after intentional layout changes - Three pre-existing launcher-smoke tests (TestBootstrapWizard_HuhLauncher*Smoke) were rewired to stub the new program-runner seam (runBootstrapTeaProgram) instead of the old huhFormRun, keeping them fast and goroutine-leak-free - TestRunInterview_BootstrapVerb_RequiresFlags now pins isStdinTTY to false so the test exercises the "no TTY → reject" branch deterministically (without the pin, an interactive `go test` session inherits a real TTY and the wizard hangs waiting for input) - Broadened the package-shared ansiEscapes regex to also strip DEC-private mode sequences (`\x1b[?25l`, `\x1b[?2004h`) so the golden screenshots stay clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two adversarial review passes identified ~15 real bugs across the TUI
refactor and the recent CodeRabbit-driven service-layer fixes. This
commit lands the meaningful ones; the rest were design choices,
nitpicks, or speculative concerns.
## TUI critical fixes
- **Production wizard wiring**: runBootstrapTeaWizard previously called
`tea.NewProgram(model, tea.WithInput(nil), tea.WithOutput(nil))` — the
wizard would have hung the moment any user pressed a key, because
bubbletea had no stdin. Use default stdio + WithAltScreen() like the
report wizard does. The smoke tests didn't catch this because they
stubbed out runBootstrapTeaProgram entirely.
- **Smoke tests were tautologies**: stubTeaProgramRunner just set
`confirmed = true` and returned a synthetic result. The state machine
could be broken and the tests would still pass. The stub now walks
the model through real advance()/nextStep() transitions in-process,
filling values that validators require, so a regression in the rubric
branch or time-box sub-step actually fails a test.
- **Time-box "custom" branch**: previously swapped the form in place
without advancing m.step, so highWater stayed at ibStepTimeBox and the
summary panel claimed the user was still on the time-box screen while
the custom-minutes input rendered. Introduced an explicit
ibStepTimeBoxCustom step the state machine routes through.
- **Confirm "Cancel" produced silent success**: clicking Cancel set
confirmed=false but the dispatcher returned 0 with no message,
indistinguishable from a successful no-op. Now prints
"Wizard cancelled at confirm" / "Wizard aborted" to stderr so users
see why no role was generated.
- **WindowSizeMsg never reached the huh form**: on resize we updated
width and returned, but didn't forward the message into the form
itself. Internal field viewports were stale until the next keystroke.
Now forward the resize message after updating width; this also makes
huh.Confirm render its button row inline (visible in the refreshed
confirm-step screenshot).
- **truncate() sliced bytes, not runes**: summary-panel labels
containing multi-byte characters (Étienne, CJK, etc.) would render as
invalid UTF-8 when truncated mid-codepoint. Switched to []rune slicing
and rejected non-positive max explicitly.
- **Deleted ~200 lines of dead code**: stepRole through stepConfirm and
stepConditionalRubric in interview_bootstrap_form.go were unreachable
after the runner swap. stubHuhFormRunner in the test file was also
unused. Both removed.
## Service-layer critical fixes
- **assertSafeToClear was too permissive**: a misconfigured
outputDir of "/tmp", "/var", "/usr", "/home", etc. previously passed
and clearOutputDir would recursively delete the entire system
directory. Now reject a DANGEROUS_ROOTS set explicitly, plus
single-segment absolute paths. Subdirectories of /tmp (mkdtemp paths)
are still permitted, so tests work normally.
- **Path traversal via symlinks**: resolveWithinRoot did string-level
containment math but never realpath'd the parent chain. If a previous
generation attempt left a symlink like `subdir -> /etc`, writing
`subdir/passwd` would land outside the workspace. Now walk intermediate
segments with lstatSync and refuse any existing symlink in the path.
- **YAML escapes dropped most C0 controls**: audit-writer's
escapeYamlDoubleQuoted handled `\\ " \n \r \t` but let other control
characters (\x00–\x1F, \x7F) through verbatim. Per YAML 1.2 §5.7,
unescaped controls in a double-quoted scalar are a parse error in
strict parsers (js-yaml FAILSAFE, libyaml). A candidate name with a
stray \x01 from copy-paste would silently corrupt audit.json
round-trip. Now emit `\xHH` for any C0 control character.
- **test-pass spawnSync timeout didn't kill child group**: Node's
timeout signals only the direct child (`go` or `bun`), leaving
sub-binaries reparented to PID 1 and still running on the grader
machine. Wrapped the spawn in coreutils `timeout --kill-after=10s 300s`
which signals the whole process group; fall back to plain spawnSync
with SIGKILL when coreutils isn't available.
## Hygiene
- **discrepancy.spec.ts afterAll(mock.restore)**: CLAUDE.md explicitly
forbids this — mock.restore() undoes ALL mock.module() registrations,
including the pinning I added at the top of the file. Per-file
isolation via scripts/run-tests.sh is the project's mode; the pinning
is defensive against bun test's non-isolated path. afterAll restore
was actively counter-productive.
- **git-history hardcodes argv and adds `--`**: the runner already
builds argv from constants, but added the literal "--" terminator
before any future caller can inject user-controlled refs/pathspecs
that begin with "-" and would be parsed as git options.
- **cohort-summary signed_off type validation + pipe escaping**:
loadCohort now requires signed_off to be a real boolean (not a string
"false" that JS treats as truthy). renderRow escapes `|` and newlines
in candidate / recommendation fields so a name like "Alice | aka Bob"
doesn't break the markdown table layout.
- **transcript regex accepts Unicode names**: switched from
`[A-Za-z][\w .'-]*?` to `\p{L}[\p{L}\p{N} .'_-]*?` with the /u flag so
"Étienne: foo" parses correctly. The previous regex silently dropped
non-ASCII speaker labels.
## What I chose NOT to fix
- isStdinTTY mutable package var (no t.Parallel today; defer)
- WaitFinished timeout deferral risk (small window; defer)
- jsonl-log existsSync race (benign — try/catch already wraps the read)
- verification.ts adjacent-only prompt window (design choice; documented)
- start.sh exec failure leaves started-at (would need a wrapper script)
- CLI nested-subcommand denylist won't catch new TUI subcommands
(need a Go-TUI-side change to reject unknown positionals)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-reported bug: - TUI bootstrap wizard exited with "Wizard cancelled at confirm" even after the user clicked through all 13 screens. huh.Confirm uses the initial value of the bound variable to decide which button has focus, and our `confirmed bool` defaulted to false (zero value), so the wizard opened the confirm screen with focus on "Cancel". A stray Enter cancelled the whole run. Now pre-set `d.confirmed = true` before building the confirm form so the affirmative button has focus by default — matches typical "Are you sure? [Y/n]" UX. CodeRabbit threads (fixed): - scripts/run-interview-review.ts: also reject when both --repo AND --local-repo-path are supplied (was previously underspecified — required at least one, but accepted both ambiguously). - src/services/interview/review/ai-observer.ts: rejectIfScored now walks the response object tree and inspects actual keys instead of regex-scanning the serialized JSON. An evidence excerpt containing literal text like `"score": 7/10` (e.g. a code comment) no longer false-positives. - skills/teamhero-interview/SKILL.md: review verb heading now matches the actual invocation form (`--candidate <name> --repo <url>`) with a note that the repo can also be positional. - tests/unit/services/interview/bootstrap/role-config-edge-cases.spec.ts: renamed the misleading "stores optional jdPath and customPrompt" test to "stores optional jdPath", and added a sibling test for customPrompt — the original test name didn't match its assertion. - src/services/interview/review/collectors/asciinema.ts: per-char iteration of evt.data so multi-character input chunks (paste, rapid typing) reconstruct commands correctly. CSI escape sequences (ESC `[` ... letter) are detected and skipped as a unit so arrow keys aren't accidentally buffered character-by-character. - src/services/interview/review/collectors/git-history.ts: catch block now distinguishes benign "no commits yet" from genuine failures (binary missing, permission denied), writing the latter to stderr instead of fully swallowing the error. CodeRabbit threads (not fixed, with rationale): - ai-observer/cohort-summary test hardening suggestions are deferred — current assertions cover the load-bearing invariants; the suggested pinning would couple tests to presentation details. - "Move port interfaces to core/types.ts" architectural suggestions apply only to ports consumed across module boundaries; the interview module's `Cloner`, `ReviewDependencies`, and `GeneratorClient` are internal to their namespaces. - Markdownlint MD040 nits on planning doc and SKILL.md are cosmetic. All 1749 TS tests pass; full Go suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ generation User-reported bug: "Idea generation failed: context deadline exceeded (Client.Timeout exceeded while awaiting headers)" in the bootstrap wizard when the user picked "Suggest ideas". Root cause: the openAIIdeaFetcher reused the package-wide defaultHTTPClient (5s timeout), which is tuned for cheap doctor/setup auth-probe calls. The OpenAI Responses API generating 5 ideas under a strict json_schema routinely takes 15-45s — the client timed out before headers came back. Changes: - tui/interview_bootstrap_ideas.go: give the idea fetcher its own http.Client with a 90s timeout sized for content generation. Tests inject client directly, so test paths are unaffected. - src/services/interview/bootstrap/openai-generator-client.ts: honor AI_MODEL from env (matches the Go side and the rest of the TS codebase). Keeps gpt-5-mini as the default when nothing is set. - scripts/run-interview-bootstrap.ts: add --model and --max-attempts flags so we can override per-invocation without editing code (needed for tuning generation quality on principal-tier roles). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pts-hto)
Bootstrap was failing validation with "35 LOC; expected 400-700" on a
Principal Full Stack role whose generated project contained 447 lines of
C# + Vue + TS. Root cause: project-validator.ts hardcoded SOURCE_EXTS to
{.ts, .tsx, .js, .jsx, .go} — the team's *own* stack — so .cs files
weren't counted at all. TEST_NAME_PATTERN had the same problem: it only
recognised foo.spec.ts / foo.test.ts naming, so xUnit *Tests.cs files
weren't classified as tests, and the failing-test detection couldn't
find the [Fact(Skip = ...)] marker the generator correctly emitted.
The interview command takes role.stack as input from the hiring manager
— the validator must not assume that stack matches the CLI's own stack.
Changes:
- SOURCE_EXTS expanded to cover JS/TS family (+ .vue, .svelte), JVM
(.java, .kt, .scala), .NET (.cs, .fs, .vb), systems (.rs, .c, .cpp,
.h), dynamic (.py, .rb, .php), Apple (.swift), and functional niches
(.ex, .clj, .ml).
- TEST_NAME_PATTERN extended to recognise *Tests.cs / *Specs.kt /
foo_test.go / test_foo.py / foo_spec.rb in addition to the JS/TS
*.spec.* / *.test.* convention.
- Skipped-test detector extended with xUnit/NUnit [Fact(Skip=)] /
[Ignore], JUnit @disabled / @ignore, pytest / unittest @Skip, RSpec
pending, Rust #[ignore], and a universal NotImplementedException /
raise NotImplementedError catchall.
- Two new regression tests: one for a C# project that lands in window
via .cs + *Tests.cs + [Fact(Skip=)], one for a Python+Vue project
that lands via .vue + .py + @pytest.mark.skip.
Verified end-to-end: gpt-5 first-attempt success on the
principal-full-stack role generating a 447-LOC C#/.NET + Nuxt project
with idiomatic xUnit skip marker — previously failed all 3 attempts
because the validator couldn't see the files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… link The interactive wizard already showed an OSC 8 hyperlink to the generated repo and offered a GitHub publish form after success (tui/interview_bootstrap_publish.go + the result screen in interview_bootstrap_generate.go). The headless path (`teamhero interview bootstrap --headless ...`), used by scripts and proctors invoking the CLI directly, dropped the user at the bun script's plain "Output: <dir>" line with no link and no publish opportunity — so a proctor running the supported headless flow had to copy-paste the path, then drive `gh repo create` + `git push` by hand every time. Wires `runInterviewBootstrap` through the same post-success UX: - On exit==0, print `Project: <abs-path>` wrapped in an OSC 8 file:// hyperlink (ctrl-click opens the dir in Finder/Explorer). - If stdin is a TTY and --no-confirm is absent, invoke the existing offerPublishToGitHub flow. CI runs (non-TTY) and `--no-confirm` scripted runs stay silent — no surprise prompts. - offerPublishToGitHub already reads GITHUB_PERSONAL_ACCESS_TOKEN from ~/.config/teamhero/.env (the token `teamhero setup` saves), prints a "run teamhero setup" tip when it's missing rather than nagging, and uses huh forms for repo name / owner / visibility (private default). The wizard and headless paths now share that identical flow — no behavior drift. Tests via the existing offerPublishToGitHub and isStdinTTY var seams: - PrintsSuccessLinkOnZeroExit — asserts the OSC 8 envelope on stdout. - OffersPublishWhenTTY — TTY + no --no-confirm -> publish ran. - SkipsPublishWhenNoConfirm — --no-confirm -> publish skipped. - SkipsPublishWhenNotTTY — non-TTY -> publish skipped. - NoPublishOnFailure — exit!=0 -> no link, no publish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The interview verb picker was rendering with only "> Cancel" visible on the first paint. The user had to press the up arrow before they could see Bootstrap / Review / Cohort. Root cause: the Cancel option's value was the empty string "". The picker bound a `var verb string` (zero value: "") to huh.Select. huh saw the bound value matched the Cancel option, marked Cancel as the "current value" of the field, and anchored the cursor on it (the LAST row). When the form rendered with a constrained viewport, it scrolled so the cursor stayed visible, clipping every option above it. Arrow-up moved the cursor and scrolled the viewport, revealing the hidden options — which is exactly the workaround the user reported. Fix: give Cancel a distinct sentinel value "cancel" so the bound verb's zero value matches nothing in the list. huh then defaults the cursor to the first option (Bootstrap) and all four rows paint on entry. interviewVerbPicker maps the "cancel" sentinel back to the caller's "" no-op convention before returning, so the dispatcher's existing `if verb == ""` check still covers both ctrl-c abort and explicit Cancel selection — no behavior change for downstream callers. Tests: - TestInterviewVerbOptions_NoValueMatchesZeroDefault — pins the bug by asserting no option has Value == "". Any future option whose value collides with `string`'s zero default would re-introduce the clipping, so the guard covers the whole list, not just Cancel. - TestInterviewVerbOptions_IncludesCancel — guards against a refactor that drops the Cancel row entirely (would force users to commit to a verb with no way out other than ctrl-c). - TestRunInterview_NoVerb_TTY_CancelSentinelTreatedAsNoOp — confirms the picker→dispatcher contract: Cancel still produces exit 0 with no error output, identical to ctrl-c abort. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…0-min time-box Two PRD-tracked defaults now apply automatically when their flags are omitted, so a typical headless run boils down to --role + --stack + --domain + --feature + --mode-* with no path or duration to remember. Explicit flags continue to win — the original PRD's per-invocation override capability is intact. Output directory: - Default: ./interviews/<role-slug> (was: required flag, no default in the headless path; the wizard used to seed ./roles/role). - .gitignore now lists both `interviews/` (new default) and `roles/` (kept for users upgrading from the old convention so their existing generated content stays gitignored). - Wizard form (interview_bootstrap_tea.go, interview_bootstrap_wizard.go) seeds with `./interviews/role` and auto-substitutes the typed role slug on the output-dir step. Time-box: - Default: 60 minutes (was: 90). - "60 minutes" is now the (recommended) label in the wizard's huh select; "90 minutes" loses the recommendation suffix. Standard options remain 60 / 90 / 120 / Custom. - bun script (scripts/run-interview-bootstrap.ts) falls back to "60" when --time-box is omitted, matching the Go TUI's default. Implementation: - New applyBootstrapDefaults(opts) in interview_bootstrap.go runs between flag parsing and validation. nil-safe so callers don't need to guard. Empty role leaves OutputDir empty so the validator's "missing --role" error surfaces instead of producing a misleading ./interviews/<empty> path. Tests (Go): - FillsOutputDirFromRole / DoesNotOverrideExplicitOutputDir / LeavesOutputDirEmptyWhenRoleMissing / FillsTimeBoxWhenMissing / DoesNotOverrideExplicitTimeBox / NilSafe — unit coverage for applyBootstrapDefaults. - AllowsOmittedOutputDirAndTimeBox — end-to-end through runInterviewBootstrap with both flags omitted; asserts the runner sees the resolved defaults. - TestBootstrapWizard_DefaultModelHasSensibleDefaults updated to expect "60" (was "90"). Snapshot golden files (4 in tui/testdata/interview_bootstrap/) regenerated via TEAMHERO_UPDATE_SCREENSHOTS=1 to reflect the new seeds (TimeBox: "60", OutputDir: "./interviews/senior-backend"). Layout assertions still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The confirm step rendered every collected wizard field a second time as the huh.Confirm form's Description — a long "role=… · stack=… · domain=… · time-box=… · project=… · analysis=… · rubric=… · out=…" line — even though the right-hand summary panel already lists each field in a structured, labelled format. Result: the only thing the user actually has to act on (Yes / Cancel) was buried under repeated information, and the reported screenshot showed multiple lines of config metadata pushing the buttons below the fold on shorter terminals. Remove the redundant description from the confirm form. The right- hand summary panel stays untouched as the canonical pre-generation recap. The summarizeBootstrapModel helper is kept (it's still useful for logging and the existing unit test) — it's just no longer wired into the confirm form. Tests: - TestInterviewBootstrap_ConfirmStep_OmitsVerboseSummary — drives the wizard to ibStepConfirm and asserts the rendered view contains "Ready to bootstrap?" and "Yes, generate the role" but NOT the summary signature tokens "role=", "stack=", "time-box=", "out=". Any regression that pipes summarizeBootstrapModel back into the Description will reintroduce those tokens and fail this assertion. - 03-confirm-step golden snapshots (ANSI + plaintext) regenerated via TEAMHERO_UPDATE_SCREENSHOTS=1 to lock in the cleaner layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…st pass
Real wizard runs were failing with the validator complaint:
Expected at least 2 deep modules (>=80 lines); found 1.
LOC out of range: 247 lines of code; expected 400-700.
Root cause: gpt-5-mini was consistently producing a thin happy-path
sketch on attempt 1 — one deep module, ~200-300 total LOC — and only
climbing into the 400-700 window on attempts 2-3 once the validator's
specific failure list was fed back through previousFailures. With a
3-attempt budget, marginal cases ran out of retries before recovery.
Two changes work together:
1. Stronger first-pass prompt (openai-generator-client.ts). The LOC
bullet is now labeled "SIZE TARGET" with explicit framing as a
"hard constraint", names ~525 as the target midpoint to leave
variance margin, and provides per-module guidance ("100-150 lines
per deep module"). It also names a typical decomposition (types
module + service module + helpers) so the model has a concrete
shape to aim at instead of guessing. The "Do NOT shrink to a
minimal happy path" line addresses the observed failure mode
directly.
2. Default attempt budget raised from 3 to 5
(project-generator.ts). One additional retry was reliably enough
to recover marginal cases in manual testing; bumping to 5 leaves
safety margin without runaway cost (each attempt is one structured
Responses API call).
Tests:
- openai-generator-client.spec.ts: new assertion that the prompt
contains the SIZE TARGET / hard constraint / 100-150 lines per
deep module phrasing. Any future prompt edit that loses the
per-module sizing will re-introduce the first-attempt undersize bug
and trip this regression guard.
- project-generator.spec.ts: existing 3-attempt assertion updated to
reflect the 5-attempt default. The "succeeds within budget" test
still locks in mid-budget recovery (succeeds on attempt 3 of 5);
the "exhausts budget" test now expects attempts === 5 and is
rewritten with a single-malformed clientReturning (clamps to last)
so adding attempts in the future doesn't churn the fixture list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…reak Reported: in the rubric-mode step (and similar select screens) the left vertical bar (huh.ThemeCharm's focus-border decoration) renders correctly on the title line but is missing on wrapped Description lines, producing a visible break in the form's bounding rail. Root cause is upstream: huh's Charm theme doesn't extend the focused- field left bar onto Description content that wraps via huh's internal wrap path. We can't fully fix it from our side until the Charm libraries (huh / lipgloss / bubbletea) ship a fix. Pragmatic mitigation for the static-description case: shorten the Analysis-mode and Rubric-mode Descriptions so they fit on one line at the default formWidth (3/5 of an 80-col terminal = 48 chars) and never enter huh's wrap path. The full detail that used to live in the Description is moved into the option labels themselves, which render correctly even when long. Limitations: - The dynamic-content case still applies: the Feature-description textarea and Custom-rubric-prompt textarea wrap user-typed content via huh's path, and the bar-break recurs there. A separate change upgrading huh / lipgloss / bubbletea is needed to address that end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reported requirement: the interview kit's scaffolding (start.sh, end.sh, INTERVIEW_RULES.md, AGENTS.md, PRIVACY_RELEASE.md, RUBRIC_OVERVIEW.md, .claude/CLAUDE.md, lib/privacy-gate.sh, etc.) should always end up in the generated repo, regardless of whether the proctor picked a generated starter project (Mode A) or a brief-only flow (Mode B). Existing gap: `--kit-dir` was an explicit flag with no default. If the proctor invoked headless without passing it, the bootstrap skipped the kit copy entirely (project-generator.ts only runs copyDir when `kitTemplateDir` is supplied). The AI-generated project landed in the output directory but none of the proctor/candidate guidance shipped alongside it — the recording workflow that depends on start.sh / end.sh / PRIVACY_RELEASE.md was effectively broken. applyBootstrapDefaults now sets KitDir to "teamhero-interview-kit" (resolved relative to the current working directory, same lookup as findBootstrapScript in interview_bootstrap.go) whenever the flag is omitted. Explicit --kit-dir still wins, so proctors with a custom kit layout can override. Tests: - FillsKitDirSoScaffoldingAlwaysShipsToCandidate — bare opts (only Role set) get KitDir = "teamhero-interview-kit" after defaults run. - DoesNotOverrideExplicitKitDir — explicit "/opt/custom-kit" wins through the defaulting pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ug logs The wizard had three redundant ways to describe the project: the Feature text input, a late-stage "How should the AI prompt be supplied?" select, and an optional "Project prompt (optional)" addendum. The user pointed out — correctly — that the Feature description IS the project prompt, so the late-stage steps were just asking the proctor to say the same thing twice. The new flow has one either/or decision right after Domain: "I'll write the description myself" → existing Feature text input "Suggest project ideas for me" → spinner → pick-one Both paths populate data.feature directly; the projectPrompt field, --project-prompt flag, and corresponding OpenAI addendum block are all gone. Added structured run logging so future generation failures are triageable without a rerun. The dispatcher prints one line of run context to stderr (role, mode, stack, domain, time-box, rubric, output, debug), and the bun subprocess emits bootstrap.start / bootstrap.ok / bootstrap.fail records with elapsed-ms and attempt counts. The --debug flag adds truncated feature/custom-prompt bodies and output/kit paths. Strengthened the OpenAI prompt to push the model past its undersize bias. The size requirements now lead the Mode A spec, name 4-5 concrete files with line ranges, and on retry the prompt quotes the prior LOC count and demands "roughly DOUBLE the previous output" — gpt-5-mini was otherwise content to produce another 266-LOC sketch even after the validator flagged it. Smoke tests show first-attempt success at 399 LOC and three-attempt recovery at 490 LOC with no human intervention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ptions
The wizard's "Project mode" picker exposed two opaque codes (A/B). Replace
with three self-describing options that match how the proctor actually
thinks about the interview shape:
Brownfield — AI scaffolds a starter codebase
Greenfield (use the stack above) — written brief; candidate uses your stack
Greenfield (candidate picks stack) — written brief; candidate also picks tooling
The first two collapse to the existing mode A/B at the BootstrapOptions
boundary. The third is a new variant: mode B + a stackByCandidate flag
that flips the BRIEF.md's "Tech stack" section from a hard requirement
("REQUIRES the candidate to use Go") to a contextual hint
("candidate selects their own tech stack ... Go is referenced as the
team's internal tooling"). When stackByCandidate is set, the stack
choice itself becomes part of what's being evaluated.
Renamed "Rubric mode" to "How should AI share observations?" so the
question signals what the answer actually controls (the rubric the AI
uses when writing up the recorded interview). Pure label change — same
three options, same internal values.
Every wizard question now carries a one-line description telling the
hiring manager how their answer shapes the generated project: stack
drives the source language, domain shapes glossary vocabulary, feature
description becomes the project's central focus, etc.
Headless protocol: new --stack-by-candidate flag, valid only with
--mode-project B. Mode A + stack-by-candidate is rejected by both the
Go validator and TS role-config validator because "candidate picks the
stack" is incoherent when the AI is generating a starter codebase IN a
specific stack.
Backward compat: legacy A/B values in BootstrapWizardDefaults still
seed the wizard correctly (normalizeWizardProjectMode maps them) so
saved role-config.json files don't break.
Tests added:
- Project type taxonomy → (mode, stackByCandidate) resolver (Go)
- Legacy A/B value normalization (Go)
- Greenfield-open round-trip through bootstrapWizardOptionsFromModel (Go)
- --stack-by-candidate flag parsing (Go)
- Validator rejection of stack-by-candidate + mode A (Go + TS)
- OpenAI prompt branches: B alone vs B + stackByCandidate (TS)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ment The validator's 400-700 LOC and 2-deep-module rules were a heuristic inherited from the initial slice — they weren't in any product spec. They were producing real friction: perfectly serviceable 300-LOC projects were being rejected and retried, and the AI prompt was laden with size enforcement language that crowded out the actual requirements. Removed: - MODE_A_LOC_MIN / MODE_A_LOC_MAX / DEEP_MODULE_MIN_LINES / MODE_A_MIN_DEEP_MODULES constants and the validator branches that used them. - countLines() helper, SOURCE_EXTS set, extname import — all dead now. - ABSOLUTE SIZE REQUIREMENTS block and the per-file budget table from the OpenAI Mode A prompt. - extractLocFromFailure / extractDeepModuleCount retry-feedback helpers and the "CORRECTION REQUIRED — DOUBLE the previous output" retry directive (no failure type to extract from now). Kept: - README.md, GLOSSARY.md, and at-least-one-failing-test structural checks. These are real product requirements and shape what the candidate sees on minute zero. - Polyglot test-file detection (xUnit Skip, pytest.mark.skip, t.Skip, etc.) — the structural check still needs it. - Retry loop, now reverted from 5 attempts back to 3 since structural failures are rare and self-correcting. Tests: - Removed the three size-rejection tests in project-validator.spec.ts. - Added a regression guard that pins the new contract: a tiny but structurally-complete project passes validation. - Updated openai-generator-client.spec.ts: dropped the "encodes hard LOC band" test; added a "does NOT encode hard LOC band" regression guard so a future prompt edit can't silently reintroduce the rule without the matching validator coming back too. - Simplified the stubModeAProject() fixture — no more LOC padding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…candidate output
The candidate-facing repo was leaking hints about the answer:
- GLOSSARY.md listed domain concepts the candidate is supposed to
identify themselves as part of the evaluation.
- Pre-existing failing/skipped tests (e.g. describe.skip("addUser",
...)) gave away the API surface and function names the candidate
should design.
- The kit-overlaid .claude/CLAUDE.md literally told the candidate's
agent to "read GLOSSARY.md and failing tests first" — coaching
about the structure of the work before the candidate even read it.
All three are now removed from what ships to the candidate:
- project-validator no longer requires GLOSSARY.md or a failing test.
README.md is the only required structural file (the candidate-facing
brief — they need that to know what they're building).
- OpenAI prompt now explicitly forbids generating test files,
GLOSSARY.md, CLAUDE.md, or AGENTS.md under a new "DO NOT GENERATE"
block. The README's "Getting started" section instructs the candidate
to write their own tests.
- teamhero-interview-kit/.claude/CLAUDE.md deleted. The kit still
ships .claude/settings.json because that's the hook config that
writes interview.log for the AI observer (non-optional plumbing).
Verified end-to-end: a fresh smoke run produces only README.md +
source files on the AI-generated side, plus the proctor-facing kit
overlay (AGENTS.md, INTERVIEW_RULES.md, PRIVACY_RELEASE.md,
RUBRIC_OVERVIEW.md, start.sh/end.sh). No tests/, no GLOSSARY.md, no
CLAUDE.md.
Tests updated:
- project-validator.spec.ts: dropped failing-test and GLOSSARY required
checks. Added regression guards pinning the new contract — a project
with only a README.md must pass.
- openai-generator-client.spec.ts: added assertions that the prompt
forbids generating test files and GLOSSARY.md. Existing CLAUDE.md/
AGENTS.md regex updated for the new multi-line "DO NOT GENERATE"
block shape.
- project-generator.spec.ts: simplified stubModeAProject() — no
GLOSSARY, no sample test file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… toggle
The JD was previously tied to the "default+jd" rubric value — collected
only as a side-effect of picking that rubric, and used only by the AI
observer. Two problems with that:
1. The proctor couldn't combine a custom rubric WITH a JD. The rubric
choice and the JD attachment are independent decisions; coupling
them forced a false trade-off.
2. The JD was never read by the project-generation prompt. A junior
healthtech JD should be able to nudge the AI toward an
EHR-flavoured feature; instead it was inert until post-interview
review.
This change splits the JD into three standalone wizard steps right
after Domain:
"Will you provide a job description?" (yes/no)
↓ if yes
"Path to job description file"
↓
"Should the JD influence the project being generated?" (yes/no)
The rubric question simplifies to default/custom — the "default+jd"
value is retired (validator rejects it with a clear error so stale
configs surface).
Behavior changes:
- ai-observer reads the JD whenever jdPath is set, regardless of
rubric mode. The custom rubric and the JD can now coexist; both get
appended to the observer's instructions.
- openai-generator-client now reads the JD body and injects it as a
"Job description context" block in the project-generation prompt
when jdInfluencesProject is true. The model uses it to calibrate
complexity and domain. When the flag is off (or no JD), the prompt
is unchanged.
- New headless flag --jd-influences-project; rejected without --jd-path.
- BootstrapOptions and RoleConfig grow a JDInfluencesProject /
jdInfluencesProject field. Validator rejects influence=true without
a path on both sides.
Smoke verified: a JD that mentions FHIR/HL7/EHR concepts produced a
generated repo with patient-data domain types and a dashboard-summary
service. Without --jd-influences-project, the same role config
produced a generic dashboard.
PRD updated (docs/2026-05-09-candidate-interview-reviewer-plan.md) with
a new "v2 — post-shipping refinements" section that catalogs all the
changes shipped in this run (wizard rendering fixes, sensible defaults,
kit-completeness, headless UX, --debug logging, wizard simplification,
question descriptions, greenfield/brownfield taxonomy, size validator
removal, candidate-facing strip of GLOSSARY/sample-tests/CLAUDE.md, and
this JD refactor) — so the feature's evolution is visible without
spelunking git log.
Tests:
- Go: new state-machine tests for jd-provided yes/no branch,
jd-path → jd-influences-project transition, and rubric default →
output-dir (no longer detours via jd-path). Retired
TestBootstrapWizard_NextState_JDRubricRoutesToFilePicker and
TestBootstrapWizard_NextState_JDPathThenOutputDir.
- Go: validator tests for retired "default+jd" rejection, standalone
jd-path acceptance, and jd-influences-project-without-path
rejection.
- TS: new openai-generator-client tests covering JD injection when
jdInfluencesProject is true AND omission when false. Retired the
"default+jd" coupling tests in role-config.spec.ts; added pairwise
tests for standalone jd-path and jd-influences-project validation.
- TS: ai-observer test renamed to assert JD is included regardless of
rubric mode; new test that custom rubric + JD combine in the
observer's instructions.
- Snapshots regenerated for the new summary panel rows ("JD attached"
and "JD usage").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
consola's default level is 3 (info); .debug() output is hidden. When the proctor passed --debug the headline log line printed (consola.info) but the bootstrap.debug.feature / output-dir / kit-dir lines were swallowed, which was exactly the troubleshooting context --debug exists to surface. Lift consola.level to 4 inside the --debug branch so all the existing consola.debug calls below print without re-checking the flag at each site. One-line fix, verified against a Stratus principal-engineer JD smoke run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…raming Two pieces of UX feedback in one commit because they intersect in the same step ordering. (1) The JD describes the business domain — so asking for Domain BEFORE the JD made the proctor either type the domain twice or type something generic that contradicted the JD. The wizard's step order is now: Stack → JD provided? → [JD path] → [JD influences project?] → Domain (skipped if JD) → Feature source → ... With a JD attached, the Domain step is skipped entirely; the role config carries an empty domain field and the OpenAI prompt renders a "Domain: infer from the job description context below" instruction instead of a bare "Domain: .". Validators on both Go and TS sides accept an empty domain when jdPath is set. The bun script's pre-validator picked up the same rule. (2) The default rubric option label said "9 built-in dimensions" — opaque framing for the hiring manager. The 9 dimensions are *what* the rubric observes, not *why* it observes them. The new label and description say what's actually being looked for: sound engineering practices (domain-driven design, deep modules, verification discipline) applied to AI-assisted work. Also fixed: the analysis-mode step said "drafts post-interview notes" — corrected to "post-interview observations" to align with the project's observation-over-judgment ethical framing. Tests: - Go state-machine: new tests for Stack→JD-provided, JD-influence→ feature-source (skipping Domain), JD-not-provided→Domain, and Domain→feature-source. Stale Domain→JD-provided test retired. - Go validator: new pairwise tests for domain-optional-with-JD AND domain-required-without-JD. - TS role-config validator: same pairwise coverage at the role-config layer (empty domain + JD ok; empty domain + no JD rejected). - TS openai-generator-client: new tests pinning the prompt rendering for both branches — "Domain: <X>." with a proctor-supplied domain and "Domain: infer from the job description" when domain is empty. - Summary panel: JD rows now precede Domain (mirroring the wizard flow); Domain row shows "(from JD)" when the proctor took the JD branch. - Snapshots regenerated. Smoke verified: a Stratus Principal Full Stack JD attached with NO --domain produced "Agentic Work Order Planner" — a work-order review workflow with tool-call loop and human-in-the-loop checkpoint, domain visibly inferred from the JD without the proctor typing it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…st cut 60x
Six issues from a proctor's manual run of the interactive wizard:
1) Mode B wizard run shipped only BRIEF.md + role-config.json. Cause:
the wizard path never called applyBootstrapDefaults, so KitDir was
empty and the bun subprocess copied zero kit templates. Fix calls
defaults right after the wizard returns. The kit overlay
(INTERVIEW_RULES.md, AGENTS.md, PRIVACY_RELEASE.md,
RUBRIC_OVERVIEW.md, start.sh / end.sh, .claude/settings.json) now
ships on every run regardless of dispatch path.
2) Mode B output had no index.html for the candidate to open. The
orchestrator now writes a minimal static index.html for Mode B
runs that links to BRIEF.md and says "you choose the stack" — a
landing pad without prescribing tooling. Skipped for Mode A
(which already has source files) and skipped when the AI's own
output happens to include an index.html (don't clobber).
3) A single Mode B run cost $1.36 (143k tokens). Two changes brought
it to ~$0.02 (2.3k tokens) — a 60x reduction:
- reasoning effort dropped from "medium" (default) to "low". gpt-5
models burn 100k+ reasoning tokens on a structured-output task
at medium; low produces equivalent scaffolds for this work.
Override via AI_REASONING_EFFORT env var.
- Removed the full 9-dimension rubric block from the
project-generation prompt. The rubric is review-side context;
the generator only needs to know to leave decision points for
the candidate. ai-observer.ts still gets the full rubric where
it actually scores. Saves ~600 input tokens per call.
4) TUI didn't show which LLM was being used. The generate screen
now displays "Model: <name>" alongside the spinner so the proctor
sees what's about to be billed before they commit. Each Responses
API call logs an openai.usage line (input / output / reasoning /
total tokens) so cost is visible per attempt.
5) Success-link display showed the project path from the filesystem
root. printBootstrapSuccessLink now uses filepath.Rel(cwd, abs)
for the label — running teamhero from ~/Documents and writing
into ./interviews/foo shows as "interviews/foo" rather than the
full absolute path. The OSC 8 hyperlink URL is still absolute so
the click actually opens the right place. Falls back to absolute
when the output is outside cwd (Rel returns ".." in that case).
6) The `teamhero interview` no-args verb picker rendered as a bare
huh.Select — no shell header, no hints footer, breaking visual
parity with the rest of the app. The picker now prints the
shared "//// TEAM HERO" header above and the
"enter continue • ctrl+c quit" hints below, matching wizard.go.
Tests:
- Go: wizard-path-applies-KitDir-default (the original bug).
- Go: relative-path display (label is cwd-relative, OSC 8 URL is
absolute).
- TS: Mode B orchestrator writes index.html stub; Mode A doesn't;
AI-authored index.html is preserved.
- TS: prompt no longer inlines the rubric; replacement one-liner
pinned by content. Token-cost regression guard.
Smoke verified: a Stratus principal-engineer Mode B run with JD
influence used 2,315 total tokens (was 143,697 before this change)
and produced AGENTS.md, INTERVIEW_RULES.md, PRIVACY_RELEASE.md,
RUBRIC_OVERVIEW.md, start.sh, end.sh, lib/privacy-gate.sh,
.claude/settings.json, BRIEF.md, role-config.json, and index.html
in /tmp/stratus-v3 on attempt 1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two orthogonal flags so another agent can drive the bootstrap
and consume the result programmatically:
--json Emit a JSON payload to stdout describing the run.
Human-readable output (Project: link, bun subprocess
logs, publish messages) is routed to stderr so stdout
stays parseable. The orchestrating agent reads stdout
and acts on the payload — schedule interviews, send
HR notifications, etc.
--publish Auto-publish the generated repo to GitHub. No prompt.
If a GitHub token is configured (via `teamhero setup`),
the repo is created and the tree pushed. With --json,
the resulting URL lands in payload.github.url so the
downstream agent can put it directly in an email.
Either flag can be true independently; both can be true together.
Payload schema (versioned at "teamhero.interview.bootstrap/v1"):
{
"schema": "teamhero.interview.bootstrap/v1",
"role": { "slug", "title", "stack", "domain" },
"project": { "mode", "stackByCandidate", "outputDir",
"timeBoxMinutes", "feature", "analysisMode",
"rubricMode", "jd": { "path", "influencesProject" } },
"ai": { "model" },
"github": null | { "url" }
}
Field discipline:
- outputDir is always absolute so the agent doesn't have to know
what cwd the command ran from.
- jd is null when no JD was attached; payload-bearing fields when
one is.
- github is null unless --publish ran AND succeeded; an agent can
json-null-check this to decide whether to email a link.
- ai.model echoes AI_MODEL env or the gpt-5-mini default for cost
attribution.
Refactors along the way:
- Threaded os.Stdout and os.Stderr separately through main.go ->
runInterview -> runInterviewBootstrap{,WithWizard}. They were
collapsed to a single writer at the CLI entry point, which is
why the existing "stdout"/"stderr" naming in dispatcher code was
cosmetic until now — both ended up at os.Stderr.
- Split publishToGitHub() out of offerPublishToGitHub() so the new
autoPublishToGitHub var (the --publish entry point) reuses the
exact API + git plumbing without a confirm prompt.
TDD: written red → green for each of:
- --json flag parsing
- --publish flag parsing
- payload contents (role/project/ai/jd fields)
- stdout vs stderr routing (human output to stderr when --json)
- --publish + --json combination puts URL in payload
- --publish without --json doesn't emit a JSON envelope
- No --json preserves the existing human stdout behavior
Smoke verified end-to-end: `teamhero interview bootstrap --json
--jd-influences-project ...` produces the schema-v1 JSON on stdout,
pipes cleanly to jq, and leaves human-readable logs visible on
stderr for a watching operator.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
teamhero interviewis a candidate-facing AI-coding take-home experience, not a grading tool. The feature treats the interview as a project the candidate works on with whatever AI tooling they prefer, in a time-box, on their own machine — and only then, as a separate step, produces an advisory audit of how they collaborated with the AI.The framing matters. The product is the candidate experience: a fair, modern, native generative-AI coding situation. The audit is a side effect that helps the hiring manager remember and reason about the session — it is never a score, never a ranking, and never determinative.
How it works end-to-end
Hiring manager bootstraps a role.
teamhero interview bootstrap(interactive wizard, or headless flags for CI) generates a starter project tailored to the role's stack, domain, and a one-sentence feature spec. The output is a real working project with a failing test, a CLAUDE.md, a GLOSSARY.md, and 2+ deep modules — sized to fit a 60/90/120-minute window. Seedocs/interview-classification-rationale.mdfor the full pedagogical and ethical framing.Candidate receives the kit and the project. Bundled with the project is
teamhero-interview-kit/containing:INTERVIEW_RULES.md— what the candidate can and can't do (short version: use any AI tool you like).RUBRIC_OVERVIEW.md— a plain-language explanation of the 9 observation dimensions, given to the candidate up-front. Nothing is hidden from them.PRIVACY_RELEASE.md— explicit consent form: what is recorded, how it may and may not be used, and the appeal / deletion mechanism. Thestart.shrecording script will refuse to run until this is signed.Candidate works the project. They run
start.sh, which starts an asciinema recording, hands them a shell, and lets them work as they normally would. Any AI tool is fair game — Claude Code, Cursor, Copilot, Cody, plain ChatGPT-in-a-browser. The cost of any AI usage they accumulate on their personal accounts during the interview is reimbursed via a $50 gift card so that tool choice stays an honest signal of preference and not a wallet-driven compromise.Candidate finishes. They run
end.sh, which packages the recording + their commits into the repo for handoff.Hiring manager runs the audit.
teamhero interview grade --candidate "Jane" --repo <url>produces a two-tier markdown audit:summary.md— narrative observations across 9 dimensions, plus a mandatory manager sign-off section the manager must fill in with a categorical recommendation (Hire / Hire with notes / No hire) and reasoning ≥ 20 chars.audit.md— the full reasoning chain, every raw measurement, the exact bias-guard prompt used.Every audit ships with an ADVISORY banner pinned above the content — "the audit is advisory; the candidate is a person, not a score; the audit is one factor among many." The audit is never returned to the candidate by default (GDPR Art. 15 caveat is documented for the EU/UK case).
Hiring manager reviews the cohort.
teamhero interview cohort --role <slug>rolls up all candidates for a role into a rankless listing with sign-off status — no Score, no Total, no Rank columns by design.What ships in this PR
teamhero-interview-kit/— POSIX privacy gate,start.sh/end.sh,INTERVIEW_RULES.md,RUBRIC_OVERVIEW.md,PRIVACY_RELEASE.md,.claude/templateshuhwizard forbootstrap; phased progress display forgrade(clone → collect-evidence → extract-measurements → observe → audit-write); glamour-rendered audit preview with pinned ADVISORY banner; sharedinterviewStylespalettedocs/interview-rubric.md,docs/interview-classification-rationale.md,skills/teamhero-interview/SKILL.md(the conversational wrapper), README section 4Ethical commitments — preserved end-to-end in code, schema, and docs
json_schemawithadditionalProperties: false, plus a second-layerrejectIfScoredvalidator that throws if the parsed response contains any forbidden field.Manual QA Instructions
A bundled smoke script walks through the candidate-experience and hiring-manager flows without spending money on OpenAI:
The script requires a TTY (it bails on piped stdin) and uses
--mode-analysis human-onlyfor the grade step so no API key is needed.What to verify at each step
teamhero interview bootstrapwith no flags drops into the wizard. Try each rubric mode:default(skips both follow-ups),custom(reveals a text area for the prompt),default+jd(reveals a file picker; validates the path exists). Press Ctrl+C from the first screen — it must exit cleanly with no partial output.Headless bootstrap is unchanged. The script then runs
teamhero interview bootstrap --headless --role manual-test-role ... --output-dir /tmp/.... No TUI should render. Inspect the output directory:CLAUDE.mdandGLOSSARY.mdpresentteamhero-interview-kit/contents copied inteamhero interview gradewith--mode-analysis human-onlyshould show:clone → collect-evidence → extract-measurements → observe → audit-write. (In human-only mode theobservephase is a near-instant no-op.)summary.mdwith the ADVISORY banner pinned at the top, before any candidate nameSign-off gating. Open the generated
summary.md/audit.md. Confirm:audit.jsoncontains noscore/weighted_total/bandfieldsteamhero interview cohort --role <slug>should render a rankless listing — no Score, Total, or Rank columns.Optional deeper verification
Candidate-facing kit (independent of the audit flow)
To verify the candidate experience without running a full grade:
References
docs/2026-05-09-candidate-interview-grader-plan.md(now includes a full Implementation Progress table)docs/interview-rubric.mddocs/interview-classification-rationale.mdteamhero-interview-kit/INTERVIEW_RULES.mdteamhero-interview-kit/RUBRIC_OVERVIEW.mdteamhero-interview-kit/PRIVACY_RELEASE.mdskills/teamhero-interview/SKILL.md🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Scripts