feat: FoxIO spec compliance (May 2026) + JA4D6 + parity (v0.6.0)#10
Merged
Conversation
Adds CLAUDE.md, AGENTS.md, TODOS.md, PYTHON_ISSUES.md, docs/superpowers/, IDE files, OS files, and *.pcap/*.pcapng to prevent accidentally committing local-only development artifacts and large binary captures.
Updates DHCP_SKIP_OPTIONS to match the spec exactly: {0, 53, 50, 81}.
Pad (0) and the End marker (255) are now handled in the parse loop
itself, keeping the skip set focused on the spec's stated exclusions.
Adds tests/test_ja4d_foxio.py validating against the canonical Wireshark
dissector output for tests/foxio_vectors/pcap/dhcp.pcapng (4 packets:
disco/offer/reqst/dpack).
Adds JA4D6Fingerprinter for DHCPv6 (UDP/546-547). Format mirrors JA4D
({type}{size}{ip}{fqdn}_{options}_{request_list}) but with DHCPv6 semantics:
- type: 5-char abbreviation of msg-type (37 codes total per FoxIO PR #267/#270)
- size: byte length of the DUID inside option 1 (Client Identifier)
- ip: 'i' if option 4 (IATA) is present
- fqdn: 'd' if option 39 (Client FQDN) is present
- options: ALL option types in presence order, including nested options
inside IA_NA/IA_TA/IA_PD/IA Address/IA Prefix (no exclusions)
- request_list: option codes from option 6 (ORO), each as 2 bytes BE
Wires JA4D6 into ja4plus exports, CLI VALID_TYPES, and the README format
table. Also adds JA4D + generate_ja4d to the public API exports (CLI was
already exposing JA4D, but the package init wasn't).
Validates against tests/foxio_vectors/pcap/dhcpv6.pcap (6 messages:
solct/advrt/reqst/reply/relse/reply) matching the canonical Wireshark
dissector output exactly.
Per FoxIO PR #288, an empty extension list (after GREASE/SNI/ALPN filtering) must produce the literal sentinel '000000000000' instead of sha256(b'') -> 'e3b0c44298fc'. ja4plus/fingerprinters/ja4.py:128 already handles this correctly; this test pins the behavior.
Per FoxIO spec PR #277, when the first or last byte of the first ALPN
value is not ASCII alphanumeric (0-9, A-Z, a-z), the ALPN value must be
the first/last character of the lowercase HEX of the FULL first ALPN.
Previously ja4plus dropped non-ASCII bytes via decode('ascii', errors='ignore')
and emitted '99' on the first character being non-ASCII. The new path:
- _parse_alpn_with_bytes() preserves the raw ALPN bytes alongside the
best-effort decoded strings; tls_info gains 'alpn_raw'.
- compute_alpn_value(bytes) implements the PR #277 algorithm.
- generate_ja4 / get_raw_fingerprint / JA4S all consume alpn_raw when
available, falling back to latin-1-encoded alpn_protocols for callers
that only set the legacy field.
Test parametrizes all 8 examples from the PR plus single-byte / empty
edge cases, plus end-to-end via generate_ja4 and the
tls-non-ascii-alpn.pcapng fixture.
Per FoxIO PR #288: - HTTP/2 must produce '20' and HTTP/3 must produce '30' in the JA4H fingerprint's version code, not '2' / '3'. Adds _http_version_to_str() with explicit mappings instead of stripping dots from the raw string. - The cookie-VALUES hash component must be sorted by NAME only. The existing implementation relied on tuple-sort tie-breaking; switching to an explicit key=lambda kv: kv[0] makes the spec compliance unambiguous. Test suite: parametrized version mapping + cookie-name-sort hash verification + http2-with-cookies.pcapng sanity check.
Per FoxIO PR #281, when multiple packet sizes tie for the highest frequency, JA4SSH must pick the LOWEST value. The previous _mode() relied on Counter.most_common(1)[0][0], which returns whichever value the Counter happened to insert first — non-deterministic across Python runs depending on dict iteration order. The fix: among values matching the maximum count, return min(). Bare-ACK direction counting and SSH detection (the rest of PR #281) were already correct in this implementation.
The previous QUIC/UDP timing path delegated client identification to _src_is_client(), which only returned True when conn['direction'] was 'forward' — i.e. when the client IP was lexicographically smaller than the server IP. For server-first capture orderings (or simply when the client IP > server IP), 'A' was never set, so JA4L-S/JA4L-C never emitted. Fix: lock in the client endpoint as the source 5-tuple of the FIRST packet on the flow, then route subsequent packets by comparing against that anchor. Direction labelling now reflects the actual roles, not sort order. Tests cover both lex-smaller and lex-larger client IPs plus a full A/B/C/D round-trip producing JA4L-S then JA4L-C, plus a real pcap sanity check on chrome-cloudflare-quic-with-secrets.pcapng.
JA4Fingerprinter and JA4SFingerprinter now record both the hashed fingerprint and the raw / raw_original_order variants on every entry in fingerprints[], plus convenience attributes last_raw and last_raw_original_order for the most recent successful parse. Mirrors the Go reference's FingerprintResult.Raw / RawOriginalOrder fields. Adds module-level helpers compute_ja4x_from_der() and compute_ja4x_from_pem() that take bytes and return the JA4X fingerprint string, matching ja4plus-go's ComputeJA4XFromDER / ComputeJA4XFromPEM. CLI emits raw and raw_original_order fields in JSON output when the fingerprinter exposes them. CSV/table output is unchanged. Bumps version to 0.6.0 to signal the new spec features.
Adds multi-datagram QUIC Initial reassembly to support large TLS ClientHellos (e.g. ECH grease + many ALPN options) that span more than one Initial packet sharing a Destination Connection ID. New helpers in ja4plus.utils.quic_utils: - decrypt_quic_initial_crypto(payload) -> (fragments, dcid) - parse_crypto_frames(plaintext) -> [(offset, data), ...] - reassemble_crypto_fragments(fragments) -> bytes - client_hello_from_crypto_fragments(fragments) -> tls_info | None - extract_crypto_frames now skips ACK frames (0x02/0x03) instead of bailing on the first non-CRYPTO frame. JA4Fingerprinter accumulates fragments per DCID (hex) across packets and tries to parse a full ClientHello whenever new fragments arrive. Once parsed, the per-DCID buffer is released. cleanup_connection looks up DCIDs via a reverse 5-tuple map and drops any matching state.
…rd_key
ja4plus.processor.Processor (also re-exported as ja4plus.Processor) runs
every JA4+ fingerprinter on each packet and aggregates the results into
a list of dicts. Mirrors the API of ja4plus-go's ja4plus.Processor:
- process_packet(pkt) -> [result_dict, ...]
- reset() clears all underlying state
- cleanup_connection(src_ip, src_port, dst_ip, dst_port, proto)
propagates to every fingerprinter
- get_shard_key(pkt) sorted 5-tuple key for sharding
Each result dict has type, fingerprint, raw, raw_original_order, and
the connection's src/dst IP/port. Errors from individual fingerprinters
are logged at DEBUG and swallowed.
- README now mentions 10 JA4+ methods (JA4D + JA4D6) and documents the new Processor class, JA4_r / JA4_ro exposure, and the compute_ja4x_from_pem / compute_ja4x_from_der helpers. - New CHANGELOG.md captures the 0.6.0 changes: FoxIO PR #267/#270/#277/#281/#288 spec updates plus the Go-parity pass (Processor, raw fields, multi-packet QUIC CRYPTO reassembly, X.509 module helpers). - pyproject.toml bumped to 0.6.0; description mentions DHCP.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings ja4plus up to the May 2026 FoxIO JA4+ spec, adds JA4D6 (DHCPv6) as the 10th fingerprint type, and closes the remaining parity gaps with the Go sibling library. Bumps to v0.6.0.
12 commits, 583 tests passing (one pre-existing CLI subprocess timeout flake on master too — left alone per surgical-changes rule).
Spec compliance
{type:5}{size:4}{ip:1}{fqdn:1}_{options}_{request_list}(e.g.disco0000in_61-55_1-3-6-42). Verified against the Wireshark dissector test vectors.000000000000(verified, regression test added).decode('ascii', errors='ignore')and emitted"99". Now preserves raw bytes and emits the spec-correct first/last hex char. All 8 spec examples covered.HTTP/2→20,HTTP/3→30. Cookie-VALUES hash sorted by name only (explicitkey=lambda kv: kv[0]).Bug fixes
direction == 'forward'gate with first-packet client-endpoint anchoring. Previously, server-first observations silently failed to start the QUIC 4-point timing.Parity with Go ja4plus-go
Processoraggregator class (ja4plus.Processor) — runs all fingerprinters per packet;process_packet,reset,cleanup_connection,get_shard_key. Mirrors the Go API.rawandraw_original_order. CLI emits these in JSON output.compute_ja4x_from_pem(bytes)/compute_ja4x_from_der(bytes): module-level helpers.--types ja4dand--types ja4d6accepted.Breaking changes
None for the public Python API in normal use. JA4D output format changed completely per the FoxIO spec — anyone storing JA4D fingerprints will need to re-fingerprint historic captures (this is a spec change, not a library change).
Test plan
pytest tests/— 583 passingtests/test_ja4_alpn.pytests/test_ja4h_spec.pyquic-with-several-tls-frames.pcapngtests/test_ja4l_udp_direction.pytests/test_processor.pytests/test_ja4db.py::test_analyze_with_lookup_json) consumer should bump 30s timeout