Skip to content

Tokenizer Consolidation #197

@aalpar

Description

@aalpar

Summary

Consolidate ~700 lines of repetitive number parsing code in the tokenizer using quasi-sub-token helpers, reducing boilerplate by ~295 lines without changing public behavior.

Design

See plans/TOKENIZER_CONSOLIDATION_PLAN.md for full analysis.

Opportunities Identified

  • Optional sign handling (12 occurrences)
  • Decimal fraction parsing (6 occurrences)
  • Special numbers (inf.0/nan.0) handling
  • Imaginary suffix handling (10 occurrences)
  • Complex number suffix dispatch (4 occurrences)
  • Error-check-return pattern (50+ occurrences)

Phases

  • Phase 1: Low-risk helpers (mayConsumeSign, mayConsumeDecimalFraction, mayConsumeImaginary)
  • Phase 2: Special number consolidation
  • Phase 3: Complex suffix unification
  • Phase 4: State lookup table
  • Phase 5: Optional sub-tokenizer architecture (advanced)

Risk

Medium-High (number parsing is security-critical)

Metadata

Metadata

Assignees

No one assigned

    Labels

    plannedDesign complete, ready to startrefactoringCode cleanup and restructuring

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions