-
Notifications
You must be signed in to change notification settings - Fork 0
Add source field to extracted variables (filename vs text) #41
Copy link
Copy link
Open
Description
Allow regex variable extraction to run against the original filename in addition to the extracted document text. Add a source field to ExtractedVariable.
Example Config
variables:
extracted:
- name: account_id
source: filename # NEW — extract from filename instead of text
pattern: "[0-9]{8}_.*(?P<account_id>[0-9]{4}\\.[0-9]{4}\\.[0-9]{4})"
- name: invoice_num
source: text # default behavior
pattern: "INV-(?P<invoice_num>\\d+)"Implementation
Schema change:
- File:
crates/paporg/src/config/schema.rs - Add
source: Option<VariableSource>toExtractedVariable(line ~65) - Add enum:
Default:
enum VariableSource { Text, Filename }
Text
Variable engine change:
- File:
crates/paporg/src/config/variables.rs - Update
extract_variables()(line ~36) to accept bothtext: &strandfilename: &str - For each pattern, check
sourceto decide which string to match against
Pipeline change:
- File:
crates/paporg/src/pipeline/runner.rs - In
step_extract_variables()(line ~222): pass the original filename alongside the text
Acceptance Criteria
-
source: filenameextracts variables from the original filename -
source: text(default) preserves current behavior - Omitting
sourcedefaults totext - Covered by unit tests for both sources
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels