diff --git a/README.md b/README.md index ead9bbb..89a5f57 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ hardcodes `robertdelanghe.dev`, `bounded.tools`, an account, or an email. ``` integrity/ verify-site · verify (sigstore) · gen-sitemanifest · gen-provenance · structure-audit · http-probe -gates/ sbom (gen + completeness) · shacl-runner · seo-gate · axe-gate (axe-core a11y) · vuln-gate (npm audit) · readability-gate · commonmark-runner · semantic (lone) +gates/ sbom (gen + completeness) · shacl-runner · seo-gate · axe-gate (axe-core a11y) · vuln-gate (npm audit) · html-validator-gate (vnu) · readability-gate · commonmark-runner · semantic (lone) gates/conformance/ conformance-report — lone's conformance() projection (Node port of jsr:@bounded-systems/lone@0.4) + a generic HTML renderer generators/ gen-cid (IPFS UnixFS) · gen-identity (did:web + VC) · openapi (static-API helper core) emitters/ reprDigest (RFC 9530) · securityTxt (RFC 9116) · webManifest · markdown-sibling headers @@ -68,6 +68,7 @@ in-process verifier). The Deno semantic runner pins its imports in | `seo-gate.mjs` | `node …/seo-gate.mjs [distDir]` | `$DIST`. Optional `$SEO_ERROR_PAGE`, `$SEO_DEPLOY_SIDECARS`. Enforces canonical/title/description uniqueness + self-consistency, robots.txt (RFC 9309), sitemap, internal links. | | `axe-gate.mjs` | `node …/axe-gate.mjs [distDir]` | `$DIST`. Optional `$AXE_PAGES` (comma list, default: every `*.html` in dist), `$AXE_TAGS` (default `wcag2a,wcag2aa,wcag21a,wcag21aa,wcag22aa`), `$AXE_IMPACT_THRESHOLD` (`minor`/`moderate`/`serious`/`critical`, default `serious`), `$AXE_RUNNER` (`playwright` (CI, needs `playwright` + `@axe-core/playwright` + `npx playwright install chromium`) \| `tezcatl` (macOS WebKit, local)), `$AXE_REPORT` (write the JSON report). Serves dist over an ephemeral origin (so assets resolve), runs **axe-core** per page, and **fails closed** on any violation at/above the threshold. The emitted report's `axe: { serious, critical }` envelope is exactly what `conformance-report`'s `a11y.axe-serious-critical` criterion consumes — a clean run is what lets a site honestly assert it. | | `vuln-gate.mjs` | `node …/vuln-gate.mjs [projectDir]` | `$VULN_ROOT` (lockfile lives here, default `.`). Optional `$VULN_OMIT_DEV` (`true`→production deps only, default `true`), `$VULN_THRESHOLD` (highest tolerated known critical/high, default `0`), `$VULN_REPORT` (write the JSON report). Runs **`npm audit`** and **fails closed** when the known critical/high count exceeds the threshold. The report's `vulns: { knownCriticalOrHighVulns }` envelope is what `conformance-report`'s `security.no-critical-vulns` criterion consumes. | +| `html-validator-gate.mjs` | `node …/html-validator-gate.mjs [distDir]` | `$HTML_DIST`. Optional `$HTML_PAGES` (comma list, default: every `*.html`), `$HTML_THRESHOLD` (default `0`), `$HTML_REPORT`. Runs **vnu** (the Nu Html Checker, a self-contained Java jar — needs a JRE) `--errors-only` over the built pages and **fails closed** above the threshold. The report's `htmlValidator: { errors }` envelope is what `conformance-report`'s `html.validator-clean` criterion consumes. | | `readability-gate.mjs` | `node …/readability-gate.mjs [--strict]` | **The corpus is an input** the site assembles from its copy: a JSON array of `{id,text}` or an `{id:text}` map. Optional `$READABILITY_THRESHOLDS`, `$READABILITY_MIN_WORDS`, `$READABILITY_KNOWN_ACRONYMS`. WARN-only unless `--strict`. | | `commonmark-runner.mjs` | `node …/commonmark-runner.mjs [fixtures.json]` | **The site's markdown renderer module** (export `renderMarkdown`, or set `$COMMONMARK_RENDER_EXPORT`). Default fixtures pin a safe CommonMark subset + 4 hostile-HTML escapes; a site with a different renderer supplies its own `fixtures.json`. | | `semantic/gate.ts` | `deno run --allow-read --allow-net …/gate.ts` | Built HTML in `$SEMANTIC_DIR` (default `dist/blog`); `$SEMANTIC_SELECTOR` (subject node, default `article`). Imports `jsr:@bounded-systems/lone`; any error-severity finding fails CI. | diff --git a/fixtures/html/bad.html b/fixtures/html/bad.html new file mode 100644 index 0000000..2539fb9 --- /dev/null +++ b/fixtures/html/bad.html @@ -0,0 +1,17 @@ + + + + + Bad fixture + + +
+

Bad fixture

+ +
    +
  • a list is not allowed as a child of span — vnu errors here
  • +
+
+
+ + diff --git a/fixtures/html/good.html b/fixtures/html/good.html new file mode 100644 index 0000000..238254e --- /dev/null +++ b/fixtures/html/good.html @@ -0,0 +1,13 @@ + + + + + Good fixture + + +
+

Good fixture

+

A conformant HTML5 page — zero Nu HTML Checker errors.

+
+ + diff --git a/gates/html-validator-gate.mjs b/gates/html-validator-gate.mjs new file mode 100644 index 0000000..7b7256c --- /dev/null +++ b/gates/html-validator-gate.mjs @@ -0,0 +1,120 @@ +#!/usr/bin/env node +// HTML-validity gate — turns "the Nu HTML Checker passed once" into a +// CONTINUOUSLY-ENFORCED member of the conformance contract. It runs vnu (the Nu +// Html Checker, the reference HTML conformance checker, as a self-contained Java +// jar — headless, no browser, no network) over a project's BUILT pages and FAILS +// CLOSED (exit 1) when the error count exceeds a configurable threshold (default 0). +// The machine-readable result is exactly the shape lone's conformance() model +// consumes for `html.validator-clean` (`{ errors }`), so a clean run lets a site +// honestly assert that criterion — and a regression turns CI red. +// +// node gates/html-validator-gate.mjs [distDir] # build gate (exit 1 over threshold) +// +// Everything is config-driven; NOTHING about any one site is hard-coded: +// argv[2] / $HTML_DIST built output dir (default: "dist") +// $HTML_PAGES comma list of page paths under dist (default: every *.html) +// $HTML_THRESHOLD highest tolerated error count (default: 0) +// $HTML_REPORT path to write the JSON report (default: none) +// +// Requires a JRE on PATH (CI: actions/setup-java; the jar ships with `vnu-jar`). +// The pure parse/evaluation functions are exported for unit testing without Java. +import { writeFile, access, readdir } from "node:fs/promises"; +import { resolve, join } from "node:path"; +import { createRequire } from "node:module"; +import { spawnSync } from "node:child_process"; + +// ── Pure core (Java-free; unit-testable) ───────────────────────────────────── + +/** Extract error-type messages from a vnu `--format json` payload (string or object). */ +export function parseVnu(payload) { + const json = typeof payload === "string" ? JSON.parse(payload || '{"messages":[]}') : (payload || {}); + const messages = Array.isArray(json.messages) ? json.messages : []; + return messages.filter((m) => m && m.type === "error"); +} + +/** Evaluate parsed errors against the threshold. Pure: (errors[], threshold) → report. */ +export function evaluateHtml(errors, threshold = 0) { + const count = errors.length; + return { + passed: count <= threshold, + threshold, + errors: count, + // The envelope lone's conformance() consumes for `html.validator-clean`. + htmlValidator: { errors: count }, + detail: errors.slice(0, 20).map((e) => ({ + page: (e.url || "").replace(/^file:/, ""), + line: e.lastLine, + message: e.message, + })), + }; +} + +// ── Impure runner ──────────────────────────────────────────────────────────── + +const require = createRequire(import.meta.url); + +async function walkHtml(dir, base = dir) { + const out = []; + for (const e of await readdir(dir, { withFileTypes: true })) { + const p = join(dir, e.name); + if (e.isDirectory()) out.push(...await walkHtml(p, base)); + else if (e.name.endsWith(".html")) out.push(p); + } + return out; +} + +/** Run vnu over the given files; returns the error-type messages. vnu writes its + * JSON report to stderr and exits non-zero when errors exist, so we read stderr + * regardless of exit code. */ +export function runVnu(files) { + const jar = String(require("vnu-jar")); + const res = spawnSync("java", ["-jar", jar, "--errors-only", "--format", "json", ...files], { + encoding: "utf8", + maxBuffer: 64 * 1024 * 1024, + }); + if (res.error) throw new Error(`cannot run vnu (${res.error.message}). Is a JRE on PATH?`); + return parseVnu(res.stderr || '{"messages":[]}'); +} + +/** Walk → vnu → evaluate → report. Exposed for programmatic use and the kit's test. */ +export async function runHtmlGate({ dist, pages, threshold = 0 }) { + const files = pages && pages.length + ? pages.map((p) => resolve(dist, p)) + : (await walkHtml(resolve(dist))).sort(); + const report = evaluateHtml(runVnu(files), threshold); + report.pages = files.length; + return report; +} + +// ── CLI ────────────────────────────────────────────────────────────────────── + +async function main() { + const dist = resolve(process.argv[2] && !process.argv[2].startsWith("--") ? process.argv[2] : process.env.HTML_DIST || "dist"); + const exists = async (p) => { try { await access(p); return true; } catch { return false; } }; + if (!(await exists(dist))) { console.error(`✗ html-validator-gate: ${dist} not found — build first.`); process.exit(2); } + + const threshold = Number.parseInt(process.env.HTML_THRESHOLD ?? "0", 10); + if (!Number.isInteger(threshold) || threshold < 0) { + console.error(`✗ html-validator-gate: $HTML_THRESHOLD must be an integer ≥ 0 (got "${process.env.HTML_THRESHOLD}")`); + process.exit(2); + } + const pages = (process.env.HTML_PAGES || "").split(",").map((s) => s.trim().replace(/^\//, "")).filter(Boolean); + + const report = await runHtmlGate({ dist, pages, threshold }); + if (process.env.HTML_REPORT) { + await writeFile(resolve(process.env.HTML_REPORT), JSON.stringify(report, null, 2) + "\n"); + } + + const line = `html-validator-gate: ${report.errors} Nu HTML Checker error(s) over ${report.pages} built page(s) · threshold ${threshold}`; + if (!report.passed) { + console.error(`✗ ${line}`); + for (const d of report.detail) console.error(` ${d.page} L${d.line}: ${d.message}`); + process.exit(1); + } + console.log(`✓ ${line}`); +} + +// Only run the CLI when invoked directly (not when imported by a test). +if (import.meta.url === `file://${process.argv[1]}`) { + main().catch((e) => { console.error("✗ html-validator-gate: error —", e.stack || e.message); process.exit(1); }); +} diff --git a/package-lock.json b/package-lock.json index bac76af..372c0ac 100644 --- a/package-lock.json +++ b/package-lock.json @@ -16,7 +16,8 @@ "linkedom": "^0.18.0", "n3": "^1.17.3", "rdf-validate-shacl": "^0.5.10", - "sigstore": "^5.0.0" + "sigstore": "^5.0.0", + "vnu-jar": "^26.6.24" }, "bin": { "ck-axe-gate": "gates/axe-gate.mjs", @@ -2086,6 +2087,19 @@ "integrity": "sha512-gLXi7351CoyVVQw8XE5sgpYawRKatxE7kj/xmCxXOZS1kMdtcqC0ILIqLuVEVnAUQSL/evOGG3eQ+8VgbdnstA==", "license": "MIT" }, + "node_modules/vnu-jar": { + "version": "26.6.24", + "resolved": "https://registry.npmjs.org/vnu-jar/-/vnu-jar-26.6.24.tgz", + "integrity": "sha512-8HvW+WgEdXFQ8DXZsYbS47zSi57mDil9liET/9xGJRrNpzQBlMQzdrZ1+a567hKnQIVo1kKaxJM5JmcYsPhSJw==", + "hasInstallScript": true, + "license": "MIT", + "bin": { + "vnu": "vnu-jar.js" + }, + "engines": { + "node": ">=0.10" + } + }, "node_modules/web-streams-polyfill": { "version": "3.3.3", "resolved": "https://registry.npmjs.org/web-streams-polyfill/-/web-streams-polyfill-3.3.3.tgz", diff --git a/package.json b/package.json index 4e2a374..e4c670e 100644 --- a/package.json +++ b/package.json @@ -20,6 +20,7 @@ "ck-seo-gate": "./gates/seo-gate.mjs", "ck-axe-gate": "./gates/axe-gate.mjs", "ck-vuln-gate": "./gates/vuln-gate.mjs", + "ck-html-validator-gate": "./gates/html-validator-gate.mjs", "ck-readability-gate": "./gates/readability-gate.mjs", "ck-commonmark-runner": "./gates/commonmark-runner.mjs", "ck-gen-cid": "./generators/gen-cid.mjs", @@ -36,6 +37,7 @@ "linkedom": "^0.18.0", "n3": "^1.17.3", "rdf-validate-shacl": "^0.5.10", - "sigstore": "^5.0.0" + "sigstore": "^5.0.0", + "vnu-jar": "^26.6.24" } } diff --git a/test/run.mjs b/test/run.mjs index 21ecbaa..97275be 100755 --- a/test/run.mjs +++ b/test/run.mjs @@ -351,6 +351,43 @@ await test("gates/vuln-gate: parse + evaluate, e2e via npm audit", async () => { } }); +// 15. html-validator-gate: pure parse/evaluate, then a best-effort e2e via real vnu. +await test("gates/html-validator-gate: parse + evaluate, e2e on fixtures", async () => { + const { parseVnu, evaluateHtml, runHtmlGate } = await import(join(KIT, "gates", "html-validator-gate.mjs")); + + // (a) pure parse over vnu --format json payloads (errors-only filtering). + const errs = parseVnu({ messages: [ + { type: "error", message: "boom", url: "file:/p.html", lastLine: 9 }, + { type: "info", subType: "warning", message: "meh" }, + { type: "error", message: "bang", url: "file:/q.html", lastLine: 3 }, + ] }); + if (errs.length !== 2) throw new Error(`expected 2 error messages (info dropped), got ${errs.length}`); + if (parseVnu('{"messages":[]}').length !== 0) throw new Error("empty payload must parse to 0"); + + // (b) pure threshold evaluation + the lone evidence envelope. + const okEval = evaluateHtml([], 0); + if (!okEval.passed || okEval.htmlValidator.errors !== 0) throw new Error("0 errors must pass with htmlValidator {0}"); + const badEval = evaluateHtml(errs, 0); + if (badEval.passed || badEval.htmlValidator.errors !== 2) throw new Error("2 errors at threshold 0 must fail"); + + // (c) best-effort e2e on the good/bad fixtures with real vnu. A missing JRE is a + // tolerated skip (the pure logic above is the deterministic assertion). + const hasJava = spawnSync("java", ["-version"], { stdio: "ignore" }).status === 0; + const fixDir = join(FIX, "html"); + try { + if (!hasJava) throw new Error("no JRE on PATH"); + const bad = await runHtmlGate({ dist: fixDir, pages: ["bad.html"], threshold: 0 }); + if (bad.passed || bad.errors < 1) throw new Error("known-bad fixture must fail (≥1 vnu error)"); + const good = await runHtmlGate({ dist: fixDir, pages: ["good.html"], threshold: 0 }); + if (!good.passed || good.errors !== 0) throw new Error("known-good fixture must pass (0 vnu errors)"); + ok("gates/html-validator-gate: parse + evaluate, e2e on fixtures", + `pure logic asserted · e2e (vnu): bad=${bad.errors} error(s), good=clean`); + } catch (e) { + if (/must (pass|fail)|expected|envelope/.test(e.message)) throw e; + ok("gates/html-validator-gate: parse + evaluate, e2e on fixtures", `pure logic asserted · e2e SKIPPED (${e.message.split("\n")[0]})`); + } +}); + await rm(work, { recursive: true, force: true }); console.log(`\n${failed ? "✗" : "✓"} conformance-kit tests: ${passed} passed, ${failed} failed`); process.exit(failed ? 1 : 0);