Skip to content

Port 8 ProcessMapping workflows: ffr-management, prior-approval, effort-reporting, proposal-doc-completeness, award-compliance, proposal-budget-personnel, foa-checklist, subaward#33

Merged
ProfessorPolymorphic merged 2 commits into
mainfrom
eight-additional-workflows
Apr 30, 2026

Conversation

@StringTheoryDev
Copy link
Copy Markdown
Collaborator

Summary

Adds 8 new components and 8 new workflows that port the next batch of ui-insight/ProcessMapping Vandalizer workflows into the prompt-library. Each pair (component + workflow) follows the pattern established by rfa-checklist-extraction-udm in #31: a canonical single-call prompt as the harness invocation surface, a JSON-Schema 2020-12 contract, a manifest-driven Vandalizer workflow that mirrors the source ProcessMapping topology one-for-one, and a workflow-local evals scaffold.

This is 8 of the 14 ProcessMapping workflows still remaining after #31. Combined with the previously merged rfa-checklist-extraction, this brings prompt-library coverage to 9 of the 15 ProcessMapping workflows. The remaining 6 (multi-source compliance-personnel-verification & section2-personnel-eligibility, the classifier-branching award-modification-intake, the drafting workflow budget-justification-generator, and the wider export-to-banner-extraction & risk-domain-assessment) are deferred for future PRs because they introduce patterns this PR doesn't yet model.

Why

Each of these workflows already exists as a runtime configuration inside a Vandalizer instance and as a JSON description in ProcessMapping/workflows/<slug>/workflow.json. Until now they were not (a) versioned component contracts, (b) regenerable Vandalizer manifestations, or (c) catalog-discoverable entries in component_catalog.json. This PR makes all three true for the eight simplest remaining workflows in the same shape PR #31 used for rfa-checklist-extraction.

What ships, per workflow

# Component (<slug>-udm) Workflow (<slug>) Source Topology mirrored Schema fields
1 ffr-management-extraction-udm@0.1.0 workflows/ffr-management-extraction v0.1.0 WF-FFR-MANAGEMENT-EXTRACTION 1 Extraction + 1 Consolidation 5 nested buckets (submission_schedule, submission_system, required_financial_data, compliance_consequences, preparation_timeline) + 2 scalars
2 prior-approval-extraction-udm@0.1.0 workflows/prior-approval-extraction v0.1.0 WF-PRIOR-APPROVAL-EXTRACTION 1 Extraction + 1 Consolidation 3 buckets (budget_approvals, scope_timeline_approvals, approval_procedures) + rtc_waivers
3 effort-reporting-extraction-udm@0.1.0 workflows/effort-reporting-extraction v0.1.0 WF-EFFORT-REPORTING-EXTRACTION 1 Extraction + 1 Consolidation 13 fields incl. key_personnel_commitments table + 2 enums
4 proposal-document-completeness-udm@0.1.0 workflows/proposal-document-completeness v0.1.0 WF-PROPOSAL-DOC-COMPLETENESS 2 parallel Extraction + 1 Consolidation/Gap-Analysis 3 layers: as-found inventory, sponsor requirements, gap analysis (per-person + per-subawardee matrices)
5 award-compliance-extraction-udm@0.1.0 workflows/award-compliance-extraction v0.1.0 WF-AWARD-COMPLIANCE-EXTRACTION 2 parallel Extraction + 1 Consolidation 2 blocks (compliance_framework × 10 + financial_management × 10)
6 proposal-budget-personnel-extraction-udm@0.1.0 workflows/proposal-budget-personnel-extraction v0.1.0 WF-PROPOSAL-BUDGET-PERSONNEL-EXTRACTION 2 parallel Extraction + 1 Consolidation personnel listings + 4 derivable boolean triggers + budget structure
7 foa-checklist-extraction-udm@0.1.0 workflows/foa-checklist-extraction v0.1.0 WF-FOA-CHECKLIST-EXTRACTION 6 parallel Extraction + 1 Consolidation 31 fields across 8 FOA reference sections; sibling of rfa-checklist-extraction-udm
8 subaward-extraction-udm@0.1.0 workflows/subaward-extraction v0.1.0 WF-SUBAWARD-EXTRACTION 6 parallel Extraction + 1 Consolidation 6 blocks; 18 flat contact items composed into 6 {name, email, phone} objects

Each component ships:

  • prompt.md — frontmatter (semver-locked at 0.1.0) + canonical single-call extraction prompt with explicit encoding rules
  • schema.json — JSON Schema 2020-12, with extensive UDM column bindings on leaf fields (preserved verbatim from the source ProcessMapping workflow's UDM_Column annotations)
  • README.md — overview, contract scope, runtime topology, triad integration, sibling-component relationships
  • CHANGELOG.md — initial 0.1.0 entry documenting source-workflow lineage, enum-value carryover, and UDM bindings
  • evals/README.md — planned-cases breakdown (each component lists 4–5 cases that exercise distinct structural features)

Each workflow ships:

  • manifest.yaml — declarative Vandalizer-workflow source-of-truth (each Extraction task carries an embedded SearchSet whose item titles mirror the component schema field names; enums propagated from the source workflow's Enum_Values)
  • <slug>.vandalizer.json — generated by scripts/build_vandalizer_workflows.py; never hand-edited
  • README.md, CHANGELOG.md, evals/README.md
  • evals/cases/<stub>/metadata.yaml — placeholder shell flagged validated_by: pending-sponsored-programs-review to be replaced with an authorized, de-identified case before promotion to stable

Workflow-local eval posture (per docs/contracts.md)

All 8 workflows declare evals.workflow_local: true. None is a 1:1 repackaging of the canonical component prompt — each Extraction task carries a focused per-section prompt_inline body, and each Consolidation Prompt does substantial work that emerges from the multi-task topology and cannot be covered by component-level evals alone:

  • Add component: award-document-extraction-udm #1 ffr — collapses flat searchset items into nested submission_schedule / submission_system / compliance_consequences objects and normalizes the platform enum.
  • Add component: sponsor-doc-defaults-udm #2 prior-approval — collapses flat items into nested budget_approvals / scope_timeline_approvals and normalizes the per-row approval_procedures table.
  • Add component: solicitation-doc-modifications-udm #3 effort — normalizes the reporting_frequency and certification_method enums and enforces the PI-mirror rule (the PI's row in key_personnel_commitments must mirror pi_committed_effort and pi_person_months exactly).
  • Add component: nsf-budget-justification-udm #4 proposal-doc-completeness — joins as-found inventory with sponsor requirements, derives present/triggered booleans, computes per-person and per-subawardee missing lists, and ranks prioritized_missing (compliance-critical first).
  • Add component: proposal-compliance-flag-udm #5 award-compliance — merges/dedupes compliance_calendar entries across both upstream fragments, normalizes the audit_requirements and record_retention enums, and enforces the CFR-01 reconciliation sum(budget_period_amounts) == total_award_amount.
  • Add component: proposal-completeness-review-udm #6 proposal-budget-personnelderives four boolean compliance triggers (has_postdocs_or_grad_students, mentoring_plan_required, has_subawards, has_equipment_over_5k) from list lengths and combines fa_rate + fa_base into the nested fa_rate_and_base object.
  • Add component: budget-rule-review-udm #7 foa-checklist — assembles six fragments and enforces cross-field consistency (chronological critical_dates; expected_awards * max(award_range) <= total_funding).
  • Add component: document-type-classifier-udm #8 subaward — composes 18 flat contact searchset items into six structured {name, email, phone} objects, normalizes the cost_type and invoicing_frequency enums, and enforces the CFR-01 reconciliation between amount_funded, total_direct_costs, and total_indirect_costs.

Each workflow ships exactly one scaffolded stub case under evals/cases/ with validated_against_version set to the workflow's MINOR.PATCH and component_versions_at_validation recording the pinned component version. These stubs satisfy the workflow_local: true requirement per docs/contracts.md and are flagged for replacement with sponsored-programs-validated cases before promotion to stable.

Build script

The kind: Extraction + validation_plan passthrough extension that landed in PR #31 covers all 8 workflows — no new build-script changes in this PR. The build script is unchanged. scripts/build_vandalizer_workflows.py --check confirms all 12 manifests round-trip to identical bytes:

$ python3 scripts/build_vandalizer_workflows.py --check
All 12 workflow export(s) up to date.

Triad integration

  • prompt-library: 8 new components + 8 new workflows, all catalog-discoverable. component_catalog.json regenerated from manifests + overrides. component_catalog_overrides.yaml adds 8 curated entries with output_contract, triad_integration.harness_notes (describing each runtime topology and what the consolidator does), triad_integration.udm_alignment, and related_components cross-references (e.g., award-compliance-extraction-udmffr-management-extraction-udm and prior-approval-extraction-udm as drilldown targets; proposal-budget-personnel-extraction-udmproposal-document-completeness-udm as producer/consumer; foa-checklist-extraction-udmrfa-checklist-extraction-udm as siblings).
  • evaluation-data-sets: none yet — every component has triad_integration.evaluation_datasets: []. Each workflow's eval-stub metadata.yaml describes the structural features and workflow features the case will exercise once a real authorized, de-identified case is selected.
  • evaluation-harness: the canonical prompt.md remains the single-call harness invocation surface; per-workflow scoring (post-consolidation JSON) is the right signal for the v0.1.0 runtimes. Component READMEs and workflow READMEs both note this distinction — campaign authors should record both signals when both are available.
  • AI4RA-UDM: scalar metadata + UDM-column leaf bindings preserved verbatim from the source ProcessMapping workflows' UDM_Column annotations. See each component's CHANGELOG.md for the full per-component binding list.

Test plan

  • python3 scripts/build_component_catalog.pyWrote component_catalog.json (and --check confirms no drift)
  • python3 scripts/build_vandalizer_workflows.py --checkAll 12 workflow export(s) up to date
  • python3 .github/scripts/lint_components.pyLinted 22 component(s), 12 workflow(s). Only the 2 pre-existing eval-version-lag warnings on nsf-award-notice-extraction-udm and rfp-extraction remain — unrelated to this PR.
  • python3 scripts/build_docs.py — 22 components rendered, mkdocs.yml nav spliced.
  • python3 -m mkdocs build --strictDocumentation built in 2.69 seconds, 0 warnings.
  • All 8 workflows: source-workflow Validation_Plan mirrored verbatim into the manifest's top-level validation_plan: and round-tripped into the export envelope.
  • All 8 workflows: source-workflow Enum_Values mirrored verbatim into the manifest's searchset.items[].enum_values (submission_system_platform, reporting_frequency, certification_method, review_type, audit_requirements, record_retention, cost_type, invoicing_frequency, federal_agency, fa_base).
  • Reviewer to verify by importing each <slug>.vandalizer.json into a Vandalizer instance and running it against representative federal award / proposal documents; output JSON should validate against the corresponding components/<slug>-udm/schema.json.

🤖 Generated with Claude Code

…rt-reporting, proposal-document-completeness, award-compliance, proposal-budget-personnel, foa-checklist, subaward

Adds 8 new components and 8 new workflows that port the next batch of
ProcessMapping (`ui-insight/ProcessMapping`) workflows into the
prompt-library. Each pair (component + Vandalizer workflow) follows the
pattern established by `rfa-checklist-extraction-udm` (PR #31): a
canonical single-call prompt, a JSON-Schema 2020-12 contract, a
manifest-driven Vandalizer workflow that mirrors the source ProcessMapping
topology one-for-one, and a workflow-local evals scaffold.

This is 8 of the 14 ProcessMapping workflows still remaining after PR #31.
Combined with the previously merged `rfa-checklist-extraction`, this
brings prompt-library coverage to 9/15 ProcessMapping workflows.

Verification (all 5 scripts clean):
  - python scripts/build_component_catalog.py     -> Wrote catalog
  - python scripts/build_vandalizer_workflows.py  -> 12/12 up to date
  - python .github/scripts/lint_components.py     -> 22 components, 12
    workflows; only the 2 pre-existing eval-version-lag warnings remain
  - python scripts/build_docs.py                  -> 22 components rendered
  - python -m mkdocs build --strict               -> 0 warnings

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread workflows/award-compliance-extraction/manifest.yaml Outdated
Comment thread components/effort-reporting-extraction-udm/README.md
Comment thread workflows/subaward-extraction/manifest.yaml
…rovenance SHA

Three review comments from @ProfessorPolymorphic on #33:

1. Monetary number-vs-string mismatch (award-compliance, but pattern
   audited across the four components with `number`-typed schema fields).
   Component prompts and workflow manifest prompt_inline bodies told the
   extractor to quote dollar amounts verbatim as strings, but the schemas
   require JSON numbers. Updated the encoding rules in:
     - components/award-compliance-extraction-udm/prompt.md
     - components/proposal-budget-personnel-extraction-udm/prompt.md
     - components/foa-checklist-extraction-udm/prompt.md
     - components/subaward-extraction-udm/prompt.md
     - workflows/award-compliance-extraction/manifest.yaml
     - workflows/proposal-budget-personnel-extraction/manifest.yaml
     - workflows/foa-checklist-extraction/manifest.yaml
     - workflows/subaward-extraction/manifest.yaml
   Number-typed fields ($-amounts, integers) are now explicitly rendered
   as JSON numbers (no quotes / no currency symbol / no thousand-separator);
   string-typed fields (rates, dates, narrative) keep verbatim quotation.

2. Pinned provenance SHA. Updated the Provenance section of all 8
   component READMEs and all 8 workflow READMEs to record
   `ui-insight/ProcessMapping` commit `b7176b0c913833a205efdb5e4ba00c17ff88af0f`
   instead of floating `main`.

3. Subaward requiredness. The source workflow marks
   `Federal_Award_Number`, `Federal_Awarding_Agency`, and
   `Invoicing_Frequency` as `Is_Required: true` but the port marked them
   optional. Restored source requiredness in
   components/subaward-extraction-udm/{schema.json, prompt.md} and
   workflows/subaward-extraction/manifest.yaml searchset items.

   Audit sweep flagged the same drift in two other ports:
     - ffr-management-extraction-udm: `submission_schedule.annual_ffr_due`
       and `final_ffr_due` were nullable; source marks them required.
       Fixed schema + prompt.
     - effort-reporting-extraction-udm: `award_number`, `pi_name`,
       `project_title`, `reporting_frequency` were optional; source
       marks them required. Fixed schema + prompt.

   For required string fields where the document may not state a value,
   the prompt now instructs the LLM to use "Not specified in the
   document" rather than null, matching the source workflow's
   `Not_Found_Value`.

CHANGELOGs updated for ffr / effort / subaward components to document
the requiredness alignment.

Verification clean (no new lint or mkdocs warnings):
  - python scripts/build_component_catalog.py        -> Wrote catalog
  - python scripts/build_vandalizer_workflows.py     -> 4 rebuilt
    (--check confirms: All 12 workflow export(s) up to date)
  - python .github/scripts/lint_components.py        -> 22 components,
    12 workflows, 2 pre-existing warnings only
  - python scripts/build_docs.py                     -> regenerated
  - python -m mkdocs build --strict                  -> clean

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ProfessorPolymorphic ProfessorPolymorphic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified Labib’s fixes locally: schema numeric/string rules, restored source requiredness, pinned ProcessMapping provenance, regenerated catalog/workflow/docs outputs, and mkdocs strict build all pass. Good to merge.

@ProfessorPolymorphic ProfessorPolymorphic merged commit cfda45a into main Apr 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants