Skip to content

mizcausevic-dev/procurement-pulse-engine

Repository files navigation

procurement-pulse-engine

The crawl + aggregate engine behind the AI Procurement Pulse. Probes a universe of vendor domains for the eleven Kinetic Gain Protocol Suite documents and produces the quarterly issue dataset.

node src/run.mjs --issue issue-1 --concurrency 8 --timeout 6000
# → data/issue-1.json  (the publishable aggregate)
# → data/issue-1-raw.json  (per-domain ProbeResults)

Status — v0.4.0 (2026-05-28)

  • Universe: 1,408 unique domains across 49 verticals. Pre-Issue #5 expansion (2026-05-28) added 408 buyer-readable vendors across ten under-represented verticals: Customer Data Platform (25), eCommerce Platform (37), AI Coding & Developer AI (31), AI Agent Platform (36), AI Data Labeling (32), MarTech (63), Customer Service Platform (30), Document & eSignature (24), Mid-Market HR/Payroll (48), Vertical SaaS (82). Smoke-tested via Cloudflare DNS — 4 placeholders removed, 6 typos corrected (e.g. weights-and-biases.comwandb.com, mindbody.commindbodyonline.com, stack.comconstructconnect.com). Previous footprint: 1,007 / 38 verticals; 37 at Issue #1. Locked baseline snapshot at data/baseline-2026-05-28.json (1,007 domains, frozen — August Issue #5 measures the +408 expansion against this baseline so the original 1,007 stay comparable issue-to-issue). Snapshot: 1 publisher (KG), 0.10% rate, verifiedRate 1.0.
  • Quarterly cadence pre-registered. .github/workflows/quarterly-crawl.yml auto-fires Aug/Nov/Feb/May 15 at 14:00 UTC, against universe.csv at HEAD, with drift vs the locked data/issue-4-v04-full.json baseline. The August fire is Issue #5.
  • Per-spec discriminator on all 11 Suite paths. Each spec now requires its canonical *_version field (agent_card_version, incident_card_version, etc.). Closes the Gatsby/SPA-catchall false positive surfaced in Issue #4 — corporate.charter.com dropped 82/100 → 0/100; canonical publisher kineticgain.com still 100/100.
  • First verified signing posture. All 11 kineticgain.com dogfooded /.well-known/ docs are ed25519-signed against the public key at kineticgain.com/.well-known/pulse-signing.json. Engine probe reports { verified: 11, unsigned: 0, invalid: 0 }.
  • Engine tests: 15/15 pass on main.

The sections below preserve the Issue #1 baseline narrative and the journey through Issues #2–#4. Issue #5 (the first true quarterly delta, August 2026) inherits the v0.4 discriminator + signing posture by default.

What it does

  1. Reads universe.csvdomain,vertical,note, the set of vendors measured this issue.
  2. Probes each domain's eleven well-known paths in parallel (bounded concurrency) using the vendored well-known-probe core.
  3. Aggregates into headline rate, per-vertical rollups, per-spec adoption, and a leaderboard.
  4. Writes the issue dataset that procurement-pulse-landing renders.

The Issue #1 baseline finding

The first real crawl (37 domains across 9 verticals — AI Platform, EdTech, HealthTech, FinTech, Enterprise SaaS, Data, Observability, Identity, plus the Kinetic Gain reference properties) returned:

0.0% publication rate. Zero domains — including kineticgain.com's own properties — publish any Suite document at /.well-known/ yet.

This is the honest starting line. The Suite shipped in 2025; adoption begins from zero. Issue #1 is "The Zero Baseline" — the instrument is calibrated, the universe is defined, and every future issue measures the climb. (Methodology note: an empty index.json counts as published; a 200 of the wrong shape does not. The zero is real, not a probe artifact — verified against the discriminator-bearing specs too.)

Usage

npm run crawl -- --issue issue-1                  # full universe.csv
node src/run.mjs --issue issue-1 --limit 5        # first 5 domains (smoke test)
node src/run.mjs --issue issue-2 --concurrency 12 # next issue, more parallelism

# diff two issues (aggregate deltas; add --from-raw/--to-raw for per-domain movers)
npm run drift -- --from data/issue-1.json --to data/issue-2.json \
  --from-raw data/issue-1-raw.json --to-raw data/issue-2-raw.json
Flag Default Meaning
--issue issue-1 Output filename stem (data/<issue>.json)
--concurrency 8 Parallel domain probes
--timeout 6000 Per-fetch timeout (ms)
--limit 0 (all) Cap the universe (for smoke tests)

Output shape (data/<issue>.json)

{
  "issue": "issue-2",
  "generatedAt": "2026-05-24T…",
  "universe": { "total": 350, "verticals": 18 },
  "headline": { "domainsPublishingAny": 1, "publicationRate": 0.0029, "avgScore": 0.3 },
  "signatures": { "foundDocs": 11, "verifiedDocs": 0, "invalidDocs": 0, "verifiedRate": 0 },
  "byVertical": { "AI Platform": { "domains": 40, "publicationRate": 0, "avgScore": 0 },  },
  "bySpec": { "aeo": { "label": "AEO Protocol", "publishers": 1, "rate": 0.0029, "verified": 0 },  },
  "leaderboard": [ { "domain": "kineticgain.com", "score": 100, "tier": "comprehensive", "published": 11 },  ]
}

Signature verification (ed25519)

Each found document is checked for an ed25519 signature using Node's built-in crypto (no dependencies). A signed Suite document carries a top-level signature block:

"signature": {
  "algorithm": "ed25519",
  "public_key": "<base64 SPKI-DER>",
  "signing_key_url": "https://vendor.example/.well-known/keys/2026.json", // optional
  "value": "<base64 ed25519 signature>"
}

The signature covers the canonical serialization of the document with its own signature block removed (recursively key-sorted JSON, array order preserved). Each document resolves to one of three states, surfaced per-document and rolled up into signatures / bySpec[].verified:

Status Meaning
verified Signature present and verifies (tamper-evident).
unsigned No signature block.
invalid Signature present but fails verification, or malformed.

By default the embedded public_key is used (tamper-evidence). Pass --verify-key-fetch semantics (verifyKeyFetch option) to instead trust the key fetched from signing_key_url (provenance). Mirrors hash-attestation-rs.

Drift (src/drift.mjs)

driftAggregate(prev, curr) diffs two committed issue datasets (headline / per-vertical / per-spec / signature-rate deltas, and flags new / dropped verticals). driftDomains(prevRaw, currRaw) diffs the per-domain raw arrays to find movers — newly publishing, stopped, score and per-spec changes. The CLI writes data/<to>-drift.json and prints a human summary.

Issue markdown (src/summarize.mjs)

summarize.mjs renders a Pulse-issue markdown body by filling token placeholders in docs/issues/ISSUE_TEMPLATE.md with the issue's aggregate JSON and (optional) drift JSON. Output goes to docs/issues/<stem>.md, ready to paste into a GitHub Issue or publish via pulse.kineticgain.com.

node src/summarize.mjs --issue issue-2026-08 --baseline issue-4-v04-full --issue-number 5

The quarterly-crawl workflow invokes the summarizer automatically after drift, so the scheduled August / November / February / May firings produce JSON + ready-to-publish markdown in a single commit. The staged preview at docs/issues/issue-5-staged-draft.md shows what Issue #5 will look like — generated from the 2026-05-28 snapshot vs the locked v0.4 baseline as a pipeline dry-run.

Crawl etiquette

GET-only requests to public, designed-to-be-fetched /.well-known/ paths. Bounded concurrency, per-request timeout, redirect: follow. No authentication, no headless browser, no scraping of page content. The universe is published alongside each issue so the run is reproducible by anyone.

Roadmap

  • Expand the universe further toward ~1,200 domains. Issue #2 grew the lens from 37 → 350 domains across 18 verticals; the next pass deepens toward Fortune 500 + top 100 K-12 EdTech + top 50 HealthTech AI.
  • Signature verification — ed25519 over the canonical document hash (src/signature.mjs).
  • Drift trackingsrc/drift.mjs diffs each issue against the previous.
  • Scheduled GH Action that runs the crawl quarterly and commits the dataset, which triggers the landing-site rebuild.
  • Sign the dogfooded docs — sign kineticgain.com's own Suite documents so they flip from unsigned to verified.

Composes with

Repo Relationship
well-known-probe-js The probe core (vendored into src/probe.mjs)
procurement-pulse-landing Renders this engine's dataset at pulse.kineticgain.com
kinetic-gain-protocol-suite The eleven specs measured
aeo-crawler Heavyweight crawler; this is the Pulse-specific lightweight aggregator

License

MIT.

About

Crawl + aggregate engine behind the AI Procurement Pulse. Probes 895 vendor domains for 11 Kinetic Gain Protocol Suite docs. v0.4 per-spec discriminator kills SPA catchall false positives. KG dogfood verified=11/11 ed25519.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors