Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 14 additions & 20 deletions GAPS.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,25 +105,19 @@ and `/CA` (stroke alpha) at the requested opacity and applies it via the `gs`
operator. `PageBuilder` gained an `extgstates` map and a `Build()` block that
emits `/ExtGState << ... >>` in the page resources dictionary.

### ~~XFA script parsing is approximate~~ ✅ Fixed in v1.2.0
`convertXFAEventToRule` now dispatches to a structured `parseXFAScript`
analyser that detects language (FormCalc vs JavaScript), then tries four
pattern families in order:
- **Visibility** — `$.presence = "visible"/"hidden"` → `RuleTypeVisibility` +
`ActionTypeShow`/`ActionTypeHide` with extracted target field
- **SetValue** — `$.rawValue = expr` or `xfa.resolveNode("x").rawValue = expr`
→ `RuleTypeSetValue` + `ActionTypeSetValue` with extracted expression
- **Validation** — scripts containing `return true/false` or message-box calls
→ `RuleTypeValidate` + `ActionTypeValidate`
- **Calculate** — scripts using FormCalc built-ins (`Sum`, `Avg`, `Concat`, …)
or JavaScript `return expr` → `RuleTypeCalculate` + `ActionTypeCalculate`
with the extracted expression

Conditional guards (`if (cond) then … endif` / `if (cond) { … }`) are parsed
into a `*Condition` with operator, field reference, and literal value where the
expression is a simple binary comparison. The full raw script is always
preserved in `Action.Script` for evaluation. Complex scripts that match none of
the above patterns fall back to `ActionTypeExecute` as before.
### ~~XFA script parsing is approximate~~ ✅ Superseded in v2.0.0
The structured `parseXFAScript` analyser shipped in v1.2.0 (regex-based
visibility / set-value / validate / calculate classification, populating
`Rule`/`Condition`/`Action`) was lossy and incorrect on non-trivial scripts —
faithful interpretation needs a real ES5/FormCalc AST, which is out of scope
for a PDF library. v2.0.0 removes the heuristic and the `Rule`/`Action` type
surface entirely; `FormSchema.Scripts []FormScript` now exposes verbatim
script bodies with their event activity, language (`javascript` | `formcalc`),
SOM owner path, and owner ID, leaving interpretation to the caller. See the
**Forms** section of the README for usage and the type comment on
`types.FormScript` for known gaps (scripts attached to nodes pdfer does not
surface — decorative `<draw>`, `bind="none"` non-AddAttachment buttons,
`<pageArea>` events, per-option `<field>`s collapsed into an `<exclGroup>`).

---

Expand Down Expand Up @@ -340,7 +334,7 @@ CJK vertical text, some Arabic/Hebrew layout engines.

---

### ~~XFA browser rendering gaps~~ ✅ Fixed in v1.4.0
### ~~XFA browser rendering gaps~~ ✅ Fixed in v2.0.0

Fourteen improvements to the XFA → `FormSchema` translation for browser rendering fidelity:

Expand Down
17 changes: 16 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,21 @@ filled, err := form.Fill(pdfBytes, pdfer.FormData{"FirstName": "Alice"}, nil, fa
out, err := pdfer.FlattenForm(filled, nil, false)
```

For XFA forms, `schema.Scripts` exposes raw `<script>` blocks verbatim:

```go
for _, s := range schema.Scripts {
fmt.Printf("[%s] %s (%s) on %s\n%s\n", s.Event, s.Name, s.Language, s.OwnerPath, s.Body)
}
```

`FormScript.Body` is the unmodified source. pdfer does not interpret
FormCalc or JavaScript semantics — callers that need to evaluate scripts
should plug in their own parser. `Question.Scripts` and `FormSection.Scripts`
hold `FormScript.ID` references in declaration order. See
[xfa-web](https://github.com/benedoc-inc/xfa-web) for one example of an
interactive renderer built on this contract.

### Content extraction

```go
Expand Down Expand Up @@ -387,7 +402,7 @@ See [GAPS.md](GAPS.md) for the full history and detailed file pointers.
**Forms**
- `Form.Validate()` returns "not implemented" for XFA forms — structural extraction only.
- Calculated form fields are not re-evaluated on `Fill()`; dependent fields remain stale until opened in a viewer.
- XFA script parsing handles common patterns (visibility, set-value, validate, calculate) and falls back to `ActionTypeExecute` for scripts it cannot classify.
- XFA scripts are exposed verbatim via `FormSchema.Scripts` — pdfer does not interpret FormCalc or JavaScript. Scripts attached to nodes pdfer doesn't surface in the schema (decorative `<draw>`, `bind="none"` non-AddAttachment buttons, `<pageArea>` events, per-option fields collapsed into an `<exclGroup>`) are not extracted.

**Images / encoding**
- JPEG2000 (`JPXDecode`) and JBIG2 image streams are detected but not decoded.
Expand Down
2 changes: 1 addition & 1 deletion examples/extract_xfa/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ func main() {
} else {
fmt.Printf("\nParsed Form Structure:\n")
fmt.Printf(" Questions: %d\n", len(form.Questions))
fmt.Printf(" Rules: %d\n", len(form.Rules))
fmt.Printf(" Scripts: %d\n", len(form.Scripts))

if len(form.Questions) > 0 {
fmt.Println("\nFirst 10 questions:")
Expand Down
1 change: 0 additions & 1 deletion forms/acroform/parser.go
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,6 @@ func (af *AcroForm) ToFormSchema() *types.FormSchema {
FormType: "AcroForm",
},
Questions: make([]types.Question, 0),
Rules: make([]types.Rule, 0),
}

for _, field := range af.Fields {
Expand Down
Loading
Loading