Skip to content

[bounty 00] Implement RFC 5322 compliant email address parser#22

Open
zeroknowledge0x wants to merge 1 commit into
UnsafeLabs:mainfrom
zeroknowledge0x:feat/rfc5322-email-parser
Open

[bounty 00] Implement RFC 5322 compliant email address parser#22
zeroknowledge0x wants to merge 1 commit into
UnsafeLabs:mainfrom
zeroknowledge0x:feat/rfc5322-email-parser

Conversation

@zeroknowledge0x
Copy link
Copy Markdown

Summary

Implements a fully RFC 5322 compliant email address parser in Python with complete ABNF grammar coverage from sections 3.2 through 3.4, plus obsolete syntax from §4.4.

Implementation

parser.pyAddressParser class with:

  • parse() — Parse a single mailbox or group address
  • parse_address_list() — Parse comma-separated address-list per §3.4
  • parse_mailbox_list() — Parse comma-separated mailbox-list per §3.4
  • strict mode toggle: rejects obs-* productions in strict mode, accepts them in permissive mode

Features

  • Quoted-string handling: Full §3.2.4 support with quoted-pair and FWS within quotes
  • CFWS handling: Comments correctly stripped from addr-spec, extracted and stored
  • Domain literals: IPv4 and IPv6 support per §3.4.1
  • Group addresses: Correctly parsed with member list extraction
  • Obsolete syntax (§4.4): obs-local-part (mixed dot-atom/quoted-string), obs-domain, obs-angle-addr (with route), obs-FWS, obs-dtext
  • No external dependencies: Pure Python stdlib only
  • Type hints: All public methods fully typed

Test Suite

test_parser.py — 73 test cases organized by RFC section:

  • §3.2.1 (quoted-pair): 5 cases
  • §3.2.2 (FWS): 5 cases
  • §3.2.3 (CFWS/comments): 8 cases
  • §3.2.4 (quoted-string): 8 cases
  • §3.2.5 (miscellaneous tokens): 3 cases
  • §3.4 (address/mailbox/group): 12 cases
  • §3.4.1 (addr-spec/domain-literal): 8 cases
  • §4.4 (obsolete addressing): 8 cases
  • Edge cases: 5 cases
  • Invalid/rejection cases: 8 cases
  • Source field tests: 3 cases

Compliance Matrix

compliance.md — Maps every ABNF production to its RFC section, test cases, and implementation status.

CAP Blocks

Both CAP annotation blocks in source.md have been populated with real values from the execution environment.

Acceptance Criteria

  • parser.pyAddressParser class with parse(), parse_address_list(), parse_mailbox_list()
  • Strict mode rejects all obs-* productions; permissive mode accepts them
  • Quoted-string handling implements full §3.2.4 (quoted-pair, FWS within quotes)
  • CFWS correctly handled: stripped from addr-spec, comments extracted and stored
  • Domain literals support both IPv4 and IPv6 forms per §3.4.1
  • Group addresses correctly parsed with member list extraction
  • test_parser.py — 73 test cases covering all sections
  • compliance.md — maps all ABNF productions to tests and implementation
  • Both CAP blocks in source.md populated
  • No external dependencies — pure Python stdlib only
  • Type hints on all public methods
  • Parser handles inputs up to 998 characters

All 73 tests pass: python3 -m pytest test_parser.py -v

Closes #1

Implements full ABNF grammar from sections 3.2-3.4 with optional
obsolete syntax support from section 4.4.

Features:
- AddressParser class with parse(), parse_address_list(), parse_mailbox_list()
- Strict mode rejects obs-* productions; permissive mode accepts them
- Quoted-string handling with full §3.2.4 support (quoted-pair, FWS)
- CFWS correctly handled and stripped from addr-spec
- Domain literals support IPv4 and IPv6
- Group addresses with member list extraction
- 73 test cases covering all RFC sections
- compliance.md mapping all ABNF productions to tests

Closes UnsafeLabs#1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bounty $400] Implement ABNF-compliant email address parser with full §3.2–§4.4 coverage

2 participants