feat: add health snapshot reporting + CI artifacts#3
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new “health snapshot” script to generate timestamped health reports (JS syntax checks + Python agent unit tests), wires it into npm scripts, and extends CI workflows to run/report these checks and adjust evaluation gating behavior.
Changes:
- Add
scripts/health-snapshot.jsto run Node--checkacross key JS files + Python unit tests and emitlogs/health/health-snapshot-*.mdreports with retention pruning. - Add npm scripts
health:snapshotandhealth:snapshot:quiet. - Update CI: run the health snapshot in
python-agent-testsand upload the generated markdown reports as artifacts; refineeval-gatePR vs manual gating.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/health-snapshot.js | New report generator for JS syntax + Python unit test health snapshots with pruning. |
| package.json | Adds npm scripts to run the health snapshot (normal/quiet). |
| README.md | Documents health snapshot usage and report retention behavior. |
| .github/workflows/python-agent-tests.yml | Adds Node setup + health snapshot + artifact upload; changes workflow triggers/concurrency. |
| .github/workflows/eval-gate.yml | Adds branch scoping + concurrency; splits PR baseline gate from manual strict gate. |
| if (process.env.AGENT_PYTHON && String(process.env.AGENT_PYTHON).trim()) { | ||
| candidates.push({ cmd: String(process.env.AGENT_PYTHON).trim(), argsPrefix: [] }); | ||
| } | ||
|
|
||
| const winVenvPython = path.resolve(process.cwd(), '.venv', 'Scripts', 'python.exe'); | ||
| if (fs.existsSync(winVenvPython)) { | ||
| candidates.push({ cmd: winVenvPython, argsPrefix: [] }); | ||
| } | ||
|
|
There was a problem hiding this comment.
pythonCommandCandidates() only checks a Windows venv path and then falls back to python/py -3. On Linux/macOS it’s common to have .venv/bin/python and/or only python3 available, so health:snapshot can fail locally even when a venv exists. Add POSIX venv + python3 candidates (and keep the existing Windows/AGENT_PYTHON logic).
| if (process.env.AGENT_PYTHON && String(process.env.AGENT_PYTHON).trim()) { | |
| candidates.push({ cmd: String(process.env.AGENT_PYTHON).trim(), argsPrefix: [] }); | |
| } | |
| const winVenvPython = path.resolve(process.cwd(), '.venv', 'Scripts', 'python.exe'); | |
| if (fs.existsSync(winVenvPython)) { | |
| candidates.push({ cmd: winVenvPython, argsPrefix: [] }); | |
| } | |
| const cwd = process.cwd(); | |
| if (process.env.AGENT_PYTHON && String(process.env.AGENT_PYTHON).trim()) { | |
| candidates.push({ cmd: String(process.env.AGENT_PYTHON).trim(), argsPrefix: [] }); | |
| } | |
| if (process.platform === 'win32') { | |
| const winVenvPython = path.resolve(cwd, '.venv', 'Scripts', 'python.exe'); | |
| if (fs.existsSync(winVenvPython)) { | |
| candidates.push({ cmd: winVenvPython, argsPrefix: [] }); | |
| } | |
| } else { | |
| const posixVenvPython = path.resolve(cwd, '.venv', 'bin', 'python'); | |
| if (fs.existsSync(posixVenvPython)) { | |
| candidates.push({ cmd: posixVenvPython, argsPrefix: [] }); | |
| } | |
| const posixVenvPython3 = path.resolve(cwd, '.venv', 'bin', 'python3'); | |
| if (fs.existsSync(posixVenvPython3)) { | |
| candidates.push({ cmd: posixVenvPython3, argsPrefix: [] }); | |
| } | |
| } | |
| candidates.push({ cmd: 'python3', argsPrefix: [] }); |
| child.on('close', (code) => { | ||
| resolve({ | ||
| ok: code === 0, | ||
| code: code || 0, | ||
| stdout, | ||
| stderr, | ||
| startedAt, | ||
| finishedAt: new Date(), | ||
| command: [command, ...args].join(' ') |
There was a problem hiding this comment.
In the close handler, code can be null when the process exits due to a signal. Using code: code || 0 will record an exit code of 0 even though ok is false, which makes the report misleading. Capture the (code, signal) parameters and set a non-zero code when code is null (and ideally include the signal in the recorded error output/metadata).
| child.on('close', (code) => { | |
| resolve({ | |
| ok: code === 0, | |
| code: code || 0, | |
| stdout, | |
| stderr, | |
| startedAt, | |
| finishedAt: new Date(), | |
| command: [command, ...args].join(' ') | |
| child.on('close', (code, signal) => { | |
| const finishedAt = new Date(); | |
| const exitCode = typeof code === 'number' ? code : 1; | |
| let finalStderr = stderr; | |
| if (signal) { | |
| const signalMessage = `Process terminated by signal ${signal}`; | |
| finalStderr = finalStderr ? `${finalStderr}\n${signalMessage}` : signalMessage; | |
| } | |
| resolve({ | |
| ok: exitCode === 0, | |
| code: exitCode, | |
| stdout, | |
| stderr: finalStderr, | |
| startedAt, | |
| finishedAt, | |
| command: [command, ...args].join(' '), | |
| signal: signal || null |
| for (const file of files) { | ||
| const result = await runCommand(process.execPath, ['--check', file], { streamOutput: !IS_QUIET }); | ||
| outputs.push(`$ node --check ${file}`); | ||
| if (result.stdout) outputs.push(result.stdout.trim()); | ||
| if (result.stderr) outputs.push(result.stderr.trim()); | ||
|
|
||
| if (!result.ok) { | ||
| return { | ||
| ok: false, | ||
| code: result.code, | ||
| stdout: outputs.filter(Boolean).join('\n\n'), | ||
| stderr: result.stderr || '', | ||
| startedAt, | ||
| finishedAt: new Date(), | ||
| command: 'node --check <repo-js-files>' | ||
| }; |
There was a problem hiding this comment.
runJsCheck() appends result.stderr into stdout (via outputs) and also returns it separately as stderr. Since formatSection() concatenates stdout and stderr, JS check failures will duplicate the same stderr content in the report. Return either a single combined output, or keep stdout/stderr separate (but don’t include stderr in both).
| on: | ||
| push: | ||
| paths: | ||
| - "agent/**" | ||
| - ".github/workflows/python-agent-tests.yml" | ||
| branches: | ||
| - main | ||
| pull_request: | ||
| paths: | ||
| - "agent/**" | ||
| - ".github/workflows/python-agent-tests.yml" | ||
| branches: | ||
| - main |
There was a problem hiding this comment.
This workflow no longer uses paths: filters and will run for every PR targeting main, even when the Python agent isn’t touched. If the intent is to keep this scoped to agent/** (as the README currently states), consider restoring paths: (possibly in addition to branches:) to avoid unnecessary CI time.
| npm run health:snapshot | ||
| npm run health:snapshot:quiet | ||
| ``` | ||
|
|
||
| `npm run health:snapshot` runs JS syntax checks plus Python agent unit tests and writes a timestamped report to `logs/health/`. | ||
| Use `npm run health:snapshot:quiet` for CI-friendly minimal console output while preserving full report details. | ||
| Report retention is automatic: by default the latest `20` reports are kept in `logs/health/`. Override with `HEALTH_SNAPSHOT_KEEP_LATEST`. |
There was a problem hiding this comment.
README says GitHub Actions runs the Python agent tests automatically for agent/** changes, but python-agent-tests.yml now triggers on all PRs to main (no paths: filter). Update the README statement to match the new workflow behavior, or re-scope the workflow to agent/** changes.
| npm run eval:collect | ||
| npm run eval:run | ||
| npm run eval:history | ||
| node -e "const fs=require('fs');const r=JSON.parse(fs.readFileSync('eval/report.json','utf8'));const score=Number((r.summary||{}).overallScore||0);if(score<40){console.error('[EvalBaselineGate] FAIL overallScore',score,'< 40');process.exit(1)};console.log('[EvalBaselineGate] PASS overallScore='+score);" |
There was a problem hiding this comment.
The PR baseline gate uses an inline node -e ... snippet with a hard-coded threshold (< 40) to parse eval/report.json. This duplicates logic already implemented in scripts/eval/ci-gate.js (npm run eval:gate) and makes future changes harder. Consider reusing the existing gate script and setting EVAL_GATE_MIN_SCORE=40 (and any other needed env flags) for the PR baseline case instead of embedding custom parsing here.
| node -e "const fs=require('fs');const r=JSON.parse(fs.readFileSync('eval/report.json','utf8'));const score=Number((r.summary||{}).overallScore||0);if(score<40){console.error('[EvalBaselineGate] FAIL overallScore',score,'< 40');process.exit(1)};console.log('[EvalBaselineGate] PASS overallScore='+score);" | |
| npm run eval:gate | |
| env: | |
| EVAL_GATE_MIN_SCORE: '40' |
Summary
scripts/health-snapshot.jsto run JS syntax checks + Python agent unit tests and write timestamped reports.health:snapshothealth:snapshot:quietpython-agent-testsworkflow to:logs/health/*.mdartifactsValidation
node --check scripts/health-snapshot.jscmd /c npm run health:snapshot:quietnpm run agent:test:py