Skip to content

feat: FoxIO spec compliance (May 2026) + JA4D6 + parity (v0.6.0)#10

Merged
Crank-Git merged 12 commits into
masterfrom
feat/foxio-spec-2026-05
May 9, 2026
Merged

feat: FoxIO spec compliance (May 2026) + JA4D6 + parity (v0.6.0)#10
Crank-Git merged 12 commits into
masterfrom
feat/foxio-spec-2026-05

Conversation

@Crank-Git
Copy link
Copy Markdown
Owner

Summary

Brings ja4plus up to the May 2026 FoxIO JA4+ spec, adds JA4D6 (DHCPv6) as the 10th fingerprint type, and closes the remaining parity gaps with the Go sibling library. Bumps to v0.6.0.

12 commits, 583 tests passing (one pre-existing CLI subprocess timeout flake on master too — left alone per surgical-changes rule).

Spec compliance

  • JA4D rewrite per FoxIO PR #267 + #270: new format {type:5}{size:4}{ip:1}{fqdn:1}_{options}_{request_list} (e.g. disco0000in_61-55_1-3-6-42). Verified against the Wireshark dissector test vectors.
  • JA4D6 (DHCPv6) net-new: same format, recursive walk of nested options inside IA_NA / IA_TA / IA_PD / IA Address / IA Prefix. All 6 expected vectors match.
  • JA4 empty extension hash (PR #288): literal 000000000000 (verified, regression test added).
  • JA4 ALPN non-alphanumeric (PR #277): full implementation of the alnum/hex algorithm. ja4plus previously dropped non-ASCII bytes via decode('ascii', errors='ignore') and emitted "99". Now preserves raw bytes and emits the spec-correct first/last hex char. All 8 spec examples covered.
  • JA4H HTTP/2 + HTTP/3 version codes (PR #288): HTTP/220, HTTP/330. Cookie-VALUES hash sorted by name only (explicit key=lambda kv: kv[0]).
  • JA4SSH deterministic mode tiebreak (PR #281): on a frequency tie, picks the lowest payload-size value.

Bug fixes

  • JA4L UDP/QUIC server-first connections: replaced the direction == 'forward' gate with first-packet client-endpoint anchoring. Previously, server-first observations silently failed to start the QUIC 4-point timing.

Parity with Go ja4plus-go

  • Processor aggregator class (ja4plus.Processor) — runs all fingerprinters per packet; process_packet, reset, cleanup_connection, get_shard_key. Mirrors the Go API.
  • JA4 / JA4S raw exposure on results: every result now includes raw and raw_original_order. CLI emits these in JSON output.
  • Multi-packet QUIC CRYPTO reassembly: large ClientHellos that span multiple Initial datagrams are now reassembled per DCID. CRYPTO frame parser also skips ACK frames (0x02/0x03) instead of bailing.
  • compute_ja4x_from_pem(bytes) / compute_ja4x_from_der(bytes): module-level helpers.
  • CLI: --types ja4d and --types ja4d6 accepted.

Breaking changes

None for the public Python API in normal use. JA4D output format changed completely per the FoxIO spec — anyone storing JA4D fingerprints will need to re-fingerprint historic captures (this is a spec change, not a library change).

Test plan

  • Full pytest tests/ — 583 passing
  • FoxIO Wireshark JA4D / JA4D6 vectors validated end-to-end
  • All 8 ALPN spec examples covered in tests/test_ja4_alpn.py
  • HTTP/2 + HTTP/3 version codes covered in tests/test_ja4h_spec.py
  • Multi-packet QUIC reassembly tested with quic-with-several-tls-frames.pcapng
  • JA4L server-first regression covered in tests/test_ja4l_udp_direction.py
  • Processor cleanup + sharding covered in tests/test_processor.py
  • Pre-existing flaky CLI subprocess test (tests/test_ja4db.py::test_analyze_with_lookup_json) consumer should bump 30s timeout

Crank-Git added 12 commits May 8, 2026 22:38
Adds CLAUDE.md, AGENTS.md, TODOS.md, PYTHON_ISSUES.md, docs/superpowers/,
IDE files, OS files, and *.pcap/*.pcapng to prevent accidentally committing
local-only development artifacts and large binary captures.
Updates DHCP_SKIP_OPTIONS to match the spec exactly: {0, 53, 50, 81}.
Pad (0) and the End marker (255) are now handled in the parse loop
itself, keeping the skip set focused on the spec's stated exclusions.

Adds tests/test_ja4d_foxio.py validating against the canonical Wireshark
dissector output for tests/foxio_vectors/pcap/dhcp.pcapng (4 packets:
disco/offer/reqst/dpack).
Adds JA4D6Fingerprinter for DHCPv6 (UDP/546-547). Format mirrors JA4D
({type}{size}{ip}{fqdn}_{options}_{request_list}) but with DHCPv6 semantics:
- type: 5-char abbreviation of msg-type (37 codes total per FoxIO PR #267/#270)
- size: byte length of the DUID inside option 1 (Client Identifier)
- ip:   'i' if option 4 (IATA) is present
- fqdn: 'd' if option 39 (Client FQDN) is present
- options: ALL option types in presence order, including nested options
           inside IA_NA/IA_TA/IA_PD/IA Address/IA Prefix (no exclusions)
- request_list: option codes from option 6 (ORO), each as 2 bytes BE

Wires JA4D6 into ja4plus exports, CLI VALID_TYPES, and the README format
table. Also adds JA4D + generate_ja4d to the public API exports (CLI was
already exposing JA4D, but the package init wasn't).

Validates against tests/foxio_vectors/pcap/dhcpv6.pcap (6 messages:
solct/advrt/reqst/reply/relse/reply) matching the canonical Wireshark
dissector output exactly.
Per FoxIO PR #288, an empty extension list (after GREASE/SNI/ALPN
filtering) must produce the literal sentinel '000000000000' instead
of sha256(b'') -> 'e3b0c44298fc'. ja4plus/fingerprinters/ja4.py:128
already handles this correctly; this test pins the behavior.
Per FoxIO spec PR #277, when the first or last byte of the first ALPN
value is not ASCII alphanumeric (0-9, A-Z, a-z), the ALPN value must be
the first/last character of the lowercase HEX of the FULL first ALPN.

Previously ja4plus dropped non-ASCII bytes via decode('ascii', errors='ignore')
and emitted '99' on the first character being non-ASCII. The new path:

- _parse_alpn_with_bytes() preserves the raw ALPN bytes alongside the
  best-effort decoded strings; tls_info gains 'alpn_raw'.
- compute_alpn_value(bytes) implements the PR #277 algorithm.
- generate_ja4 / get_raw_fingerprint / JA4S all consume alpn_raw when
  available, falling back to latin-1-encoded alpn_protocols for callers
  that only set the legacy field.

Test parametrizes all 8 examples from the PR plus single-byte / empty
edge cases, plus end-to-end via generate_ja4 and the
tls-non-ascii-alpn.pcapng fixture.
Per FoxIO PR #288:
- HTTP/2 must produce '20' and HTTP/3 must produce '30' in the JA4H
  fingerprint's version code, not '2' / '3'. Adds _http_version_to_str()
  with explicit mappings instead of stripping dots from the raw string.
- The cookie-VALUES hash component must be sorted by NAME only. The
  existing implementation relied on tuple-sort tie-breaking; switching
  to an explicit key=lambda kv: kv[0] makes the spec compliance
  unambiguous.

Test suite: parametrized version mapping + cookie-name-sort hash
verification + http2-with-cookies.pcapng sanity check.
Per FoxIO PR #281, when multiple packet sizes tie for the highest
frequency, JA4SSH must pick the LOWEST value. The previous _mode()
relied on Counter.most_common(1)[0][0], which returns whichever value
the Counter happened to insert first — non-deterministic across Python
runs depending on dict iteration order.

The fix: among values matching the maximum count, return min().

Bare-ACK direction counting and SSH detection (the rest of PR #281)
were already correct in this implementation.
The previous QUIC/UDP timing path delegated client identification to
_src_is_client(), which only returned True when conn['direction'] was
'forward' — i.e. when the client IP was lexicographically smaller than
the server IP. For server-first capture orderings (or simply when the
client IP > server IP), 'A' was never set, so JA4L-S/JA4L-C never
emitted.

Fix: lock in the client endpoint as the source 5-tuple of the FIRST
packet on the flow, then route subsequent packets by comparing against
that anchor. Direction labelling now reflects the actual roles, not
sort order.

Tests cover both lex-smaller and lex-larger client IPs plus a full
A/B/C/D round-trip producing JA4L-S then JA4L-C, plus a real pcap
sanity check on chrome-cloudflare-quic-with-secrets.pcapng.
JA4Fingerprinter and JA4SFingerprinter now record both the hashed
fingerprint and the raw / raw_original_order variants on every entry
in fingerprints[], plus convenience attributes last_raw and
last_raw_original_order for the most recent successful parse. Mirrors
the Go reference's FingerprintResult.Raw / RawOriginalOrder fields.

Adds module-level helpers compute_ja4x_from_der() and
compute_ja4x_from_pem() that take bytes and return the JA4X fingerprint
string, matching ja4plus-go's ComputeJA4XFromDER / ComputeJA4XFromPEM.

CLI emits raw and raw_original_order fields in JSON output when the
fingerprinter exposes them. CSV/table output is unchanged.

Bumps version to 0.6.0 to signal the new spec features.
Adds multi-datagram QUIC Initial reassembly to support large TLS
ClientHellos (e.g. ECH grease + many ALPN options) that span more than
one Initial packet sharing a Destination Connection ID.

New helpers in ja4plus.utils.quic_utils:
- decrypt_quic_initial_crypto(payload) -> (fragments, dcid)
- parse_crypto_frames(plaintext) -> [(offset, data), ...]
- reassemble_crypto_fragments(fragments) -> bytes
- client_hello_from_crypto_fragments(fragments) -> tls_info | None
- extract_crypto_frames now skips ACK frames (0x02/0x03) instead of
  bailing on the first non-CRYPTO frame.

JA4Fingerprinter accumulates fragments per DCID (hex) across packets
and tries to parse a full ClientHello whenever new fragments arrive.
Once parsed, the per-DCID buffer is released. cleanup_connection looks
up DCIDs via a reverse 5-tuple map and drops any matching state.
…rd_key

ja4plus.processor.Processor (also re-exported as ja4plus.Processor) runs
every JA4+ fingerprinter on each packet and aggregates the results into
a list of dicts. Mirrors the API of ja4plus-go's ja4plus.Processor:

- process_packet(pkt) -> [result_dict, ...]
- reset()                                  clears all underlying state
- cleanup_connection(src_ip, src_port, dst_ip, dst_port, proto)
                                            propagates to every fingerprinter
- get_shard_key(pkt)                       sorted 5-tuple key for sharding

Each result dict has type, fingerprint, raw, raw_original_order, and
the connection's src/dst IP/port. Errors from individual fingerprinters
are logged at DEBUG and swallowed.
- README now mentions 10 JA4+ methods (JA4D + JA4D6) and documents
  the new Processor class, JA4_r / JA4_ro exposure, and the
  compute_ja4x_from_pem / compute_ja4x_from_der helpers.
- New CHANGELOG.md captures the 0.6.0 changes:
  FoxIO PR #267/#270/#277/#281/#288 spec updates plus the Go-parity
  pass (Processor, raw fields, multi-packet QUIC CRYPTO reassembly,
  X.509 module helpers).
- pyproject.toml bumped to 0.6.0; description mentions DHCP.
@Crank-Git Crank-Git merged commit 5ab0252 into master May 9, 2026
7 checks passed
@Crank-Git Crank-Git deleted the feat/foxio-spec-2026-05 branch May 9, 2026 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant