Skip to content

refactor(#109): introduce PDFExtractorOptions to reduce PDFTableExtractor constructor params#139

Merged
longieirl merged 1 commit intomainfrom
fix/109-pdf-extractor-options
Apr 10, 2026
Merged

refactor(#109): introduce PDFExtractorOptions to reduce PDFTableExtractor constructor params#139
longieirl merged 1 commit intomainfrom
fix/109-pdf-extractor-options

Conversation

@longieirl
Copy link
Copy Markdown
Owner

Pull Request

Summary

Closes #109. PDFTableExtractor.__init__ had 11 parameters, exceeding pylint R0913/R0917 design limit. This PR groups the optional config arguments into a PDFExtractorOptions dataclass, reducing the constructor to 3 parameters (columns, options, pdf_reader).

Changes

  • extraction_params.py: add PDFExtractorOptions dataclass with all 8 optional config fields
  • pdf_extractor.py: replace flat keyword args with options: PDFExtractorOptions | None = None
  • extraction_facade.py: update extract_tables_from_pdf() to construct and pass PDFExtractorOptions
  • commands/analyze_pdf.py: update _validate_extraction() to use PDFExtractorOptions
  • Test files updated: test_pdf_extractor.py, test_page_skipping.py, test_document_type_enrichment.py, test_credit_card_detection.py

Type

  • Bug fix
  • New feature
  • Breaking change
  • Refactoring
  • Documentation
  • Performance
  • Security

Testing

  • Tests pass (1551 passing, 4 skipped)
  • Manually tested
  • Integration test passed locally

Checklist

  • Code follows project style
  • Self-reviewed
  • Documentation updated (if needed)
  • No new warnings

Downstream impact

  • This PR changes a public interface in bankstatements_core (exported class, function, or exception)

…structor params

Replaces 8 flat keyword arguments on PDFTableExtractor.__init__ with a single
PDFExtractorOptions dataclass, bringing the constructor within pylint R0913/R0917
design limit. All callers (extraction_facade, analyze_pdf, tests) updated to use
the new options= parameter. No behaviour change.
@longieirl longieirl self-assigned this Apr 9, 2026
@github-actions github-actions bot added the bug Something isn't working label Apr 9, 2026
@longieirl longieirl merged commit f47d341 into main Apr 10, 2026
11 checks passed
@longieirl longieirl deleted the fix/109-pdf-extractor-options branch April 10, 2026 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: PDFTableExtractor.__init__ has too many parameters (13)

2 participants