Skip to content

feat(xfa): extract orphan scripts + scope design doc#10

Merged
lfstokols merged 4 commits into
mainfrom
feat/xfa-orphan-scripts
May 26, 2026
Merged

feat(xfa): extract orphan scripts + scope design doc#10
lfstokols merged 4 commits into
mainfrom
feat/xfa-orphan-scripts

Conversation

@lfstokols
Copy link
Copy Markdown
Contributor

Summary

  • Orphan-script extraction. Splits XFA script extraction from question
    emission so scripts on nodes pdfer doesn't surface as Questions or Sections
    (<pageArea> events, bind=\"none\" non-AddAttachment buttons,
    event-bearing <draw>s, per-option events flattened out of an
    <exclGroup>) appear in FormSchema.Scripts instead of being silently
    dropped. Orphan scripts carry their SOM OwnerPath with an empty
    OwnerID; scripts whose owner is an emitted Question/Section get back-refs
    populated in a single post-pass.
  • Design doc. First doc under docs/design/ — captures the XFA scope
    principle ("extract structure and surface logic, don't execute logic"),
    marks orphan-script extraction as P1#1 done, and lays out the rest of the
    P1 / P2 roadmap (Elements collection, <occur> / <bind> metadata, SOM
    resolver, data-DOM cursor API, <validate><script> capture).
  • Polish from self-review. Per-option OwnerPath now keys off the
    <field> name attribute (real SOM, e.g. group.optA) rather than the
    option's <items> value (which can be arbitrary text);
    populateScriptBackRefs collapsed to a single question pass; new
    TestNestedSubformFieldBackRef locks in that the path produced by
    extractAllScripts matches the one produced by populateScriptBackRefs
    at arbitrary nesting depth; FormScript doc + design doc §1 reframed
    around the stability contract for the empty-`OwnerID` signal so the
    enumeration of orphan cases isn't read as a permanent classification.

Test plan

  • `go test ./...` — full suite passes
  • New `TestPageAreaEventExtracted`, `TestBindNoneButtonScriptExtracted`,
    `TestDrawEventScriptExtracted`, `TestExclGroupOptionScriptsExtracted`
    assert the four orphan cases survive AND the corresponding nodes are
    still NOT emitted as Questions
  • New `TestNestedSubformFieldBackRef` asserts back-refs resolve through
    arbitrary nesting depth
  • Existing `TestQuestionScriptsIndex` / `TestSectionScriptsIndex` still
    pass with the new back-ref machinery
  • `TestScriptIDStability` still passes (script ID ordering unchanged)

lfstokols and others added 4 commits May 25, 2026 21:56
Previously, script extraction was driven by walkSubformChildren via the
per-node attachFieldScripts helper, so any script whose owner node was
suppressed by emitField / emitDraw was silently dropped. The known cases:
event-bearing <draw> elements (status indicators), bind="none"
non-AddAttachment buttons (Help Text / Show Intro triggers), <pageArea>
events, and per-option scripts on <field>s flattened into an <exclGroup>'s
Options. The b44a6f9 FormScript comment acknowledged this gap and deferred
the fix.

Split script extraction from question emission:

- New extractAllScripts walks the entire xfaNode tree post-Section walk
  and emits a FormScript for every event-bearing node, regardless of
  whether the node was emitted as a Question or Section. OwnerPath is set
  to the SOM path of the owning node; OwnerID is left empty.

- New populateScriptBackRefs indexes the resulting scripts by OwnerPath,
  fills in OwnerID and Question.Scripts / FormSection.Scripts back-refs
  whenever the owner was also emitted as a Question/Section, and leaves
  orphans with empty OwnerID.

- exclGroup OptionEvents is a parallel slice to Options that preserves
  per-option events through the flatten so they reach extractAllScripts.

- pageArea and exclGroup are now valid event-stack targets in the
  parseXFATemplate state machine (previously only subform).

The attachFieldScripts / appendScripts helpers and xfaNode.QuestionID
field are removed — back-refs are now populated by path lookup in a single
post-pass rather than threaded through emission.

Tests cover all four orphan cases (pageArea, bind=none button, draw with
event, exclGroup per-option) and assert that the corresponding nodes are
still NOT emitted as Questions — only their scripts are surfaced.

FormScript doc comment rewritten: orphan scripts are now first-class
(OwnerPath set, OwnerID empty) rather than missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the high-level design plan for pdfer's XFA surface:

- The scope principle: extract structure and surface logic, don't
  execute logic. Schema is a projection of the template DOM, not a
  snapshot of a Form DOM. Runtime model (instance counts, presence
  toggles, calculation order) is the caller's responsibility.

- P1 roadmap: orphan-script extraction (done in 88b6989), a parallel
  Elements collection for non-question template nodes with visual
  presence or events, and <occur> / <bind> metadata on Sections and
  Questions for dynamic XFA.

- P2 drafts: SOM path parser and schema resolver, SOM-keyed data-DOM
  cursor API, and <validate><script> child-element capture.

- Explicit non-goals: script execution, merge algorithm, instance
  management, calculate dependency tracking, layout engine, and
  script-body parsing. These belong in the runtime layer.

Lives under docs/design/ — first design doc in this repo, introduces
the convention. Status header marks it as P1 committed, P2 draft, and
open for discussion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up polish on 88b6989 from a self-review pass.

- Per-option script OwnerPath now keys off the option <field>'s name
  attribute (e.g. "group.optA") rather than the option's data value,
  which can contain arbitrary text from <items>. Adds OptionFieldNames
  parallel slice on xfaNode, populated during the exclGroup flatten.
  The resulting path is a real, SOM-resolvable expression.

- populateScriptBackRefs collapsed to a single pass over Questions: a
  preceding section walk records each question's containing section
  path; the question pass then computes the full SOM path and calls
  assign. The previous covered-map / two-pass split is gone.

- New TestNestedSubformFieldBackRef puts a field event three subforms
  deep and asserts both OwnerID and Question.Scripts resolve through
  the nested sections — locks in that the path built by
  extractAllScripts matches the one built by populateScriptBackRefs at
  arbitrary nesting depth.

- TestExclGroupOptionScriptsExtracted updated to use field names
  (optA/optB) distinct from option values (a/b), with a negative
  assertion that the old value-keyed path does NOT appear.

- FormScript doc comment and docs/design/xfa-scope.md §1 reframed
  around the stability contract: OwnerID empty means "owner is not
  currently surfaced as a typed schema entity." That signal is stable
  in meaning even as §2 expands the set of typed entities (Elements,
  etc.) — what shrinks is the orphan set, which is the direction
  audit-style consumers want anyway. The previous enumeration of four
  orphan cases read like a permanent classification.
@lfstokols lfstokols merged commit 33f0373 into main May 26, 2026
5 checks passed
@lfstokols lfstokols deleted the feat/xfa-orphan-scripts branch May 26, 2026 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant