Recover IQ extraction gaps: tag-discriminated unions + honest drop metrics#3
Conversation
…ver 8 () specs) The parser-completeness validator extracted struct-init field names with a `,`-only regex. A nested repeated grandchild is emitted as `tag: tag_items,`, so the regex captured the VALUE (`tag_items`) instead of the KEY (`tag`); the validator then thought `tag` was never initialized and wrongly fell the whole spec back to (). Match the identifier before `:` OR `,`. The acceptance check is one-directional (required is a subset of inited), so the extra value/Default matches are harmless. Recovers 8 specs (() 39 -> 31): BotList outcome union + 7 newsletter/group success structs. Verified: 45 wa-codegen tests + syn net pass; regenerated iq.rs compiles clean against the real wacore crate; IR/contract unchanged.
…ames Extractor: mixin_index extract_fragment dropped router-only MixinGroup modules (e.g. mergeBaseGetGroupOrServerMixinGroup: pure if/return of branch mixins, no direct <iq>), so resolve()'s BFS couldn't bridge request -> router -> branch -> base. Keep a router when it has merged_callees. Recovers the GroupProfilePictures w:g2 get stanza (155 -> 156, unparseable 58 -> 57, typed 135 -> 136) and enriches the PushConfig set request with its <config>/<clear> children (verified additive: 0 stanzas lost; a full request+response tree A/B shows only the 1 new + 1 enriched). Codegen: the PushConfig enrichment exposed a latent variant-group bug — six <config> alternatives all lead with a dynamic "platform" attr, so variant_name resolved them all to "Platform", producing a duplicate enum variant with mismatched match-arm fields (E0428/E0026/E0559). Dedup variant names within a group (Platform/Platform2/Platform3...), computed once and shared by the enum def and the build match. Verified: 46 wa-codegen + 101 wa-scan tests; regenerated iq.rs compiles clean against the real wacore crate (0 errors); --check idempotent.
Pass 1 (RPC) and Pass 2 (ResponseSuccess) of the response index miss an op whose response module ends differently — PingsClient's parser lives in WASmaxInPingsClientResponseServerResponse, which is neither an RPC nor a *ResponseSuccess. Add ResponseIndex::resolve_for_request_op(x): anchored on the EXACT op name from the request, it finds a WASmaxIn<x>Response<V> whose variant V is success-like (never reverse-deriving x by stripping "Response", which would mis-split "…ServerResponse"), and parses it on demand. module.rs chains it after get_by_x. Recovers the Ping response (from: Jid, type, t: u64; guard type="result"): typed 136 -> 137, degraded 20 -> 19. A/B confirms only makeClientRequest changed; regenerated iq.rs compiles clean against real wacore; --check idempotent.
…72 more) The largest fidelity gap: same-tag field-discriminated unions were dropped by classify_union. Add UnionShape::TagDiscriminated — variants that share one node's tag but carry richer payloads (nested children/unions). Each variant becomes a newtype enum arm over its own generated struct (collect_response_fields recurses, producing child item structs + nested enums); the parser tries them first-success, each arm a closure gated by its pinned attr values (Cow-safe `.as_deref()` guards) and required fields, returning the first that parses. A separability gate (signature subset + pinned-attr conflict, reusing parser_is_valid) drops shapes where an earlier arm would shadow a later one (e.g. userFetch Error == ErrorFallback) rather than misclassify. collect_union now returns (RustField, Vec<RustEnum>, Vec<RustChildStruct>) so the per-variant structs/enums reach the module-level emit; emit_response_parser was factored into a reusable emit_struct_parser for the per-arm bodies. The repeated-child path (collect + emit) now also recurses into a Vec<Item>'s union columns, so unions nested under repeated children are reached. Recovers 116 union enums (was 10), including the GroupInfo participant (Admin/NonAdmin) and newsletter message (text/media/poll/…) unions. Pure codegen: IR/contract unchanged, --check idempotent. 48 wa-codegen tests (3 new: participant, attr-value, non-separable); the regenerated iq.rs compiles clean against the real wacore crate (0 errors).
…7 -> 6) The "unparseable" count was dominated by noise: 51 of 57 are cross-module mixin fragments (modules that export ONLY merge…Mixin combinators and build a partial <iq> missing xmlns/type by design, folded into real requests via mergeStanzas). Their data already lives in the concrete request consumers — they are not dropped stanzas. Add DropReason::MixinFragment and classify a module as such at the unresolved bail when its only functions are merge…Mixin (hoisting the export computation above the bail). The CLI headline now reports genuine unresolved (6) separately from benign fragments (51), and the manifest dropsByReason reflects the accurate split. Diagnostics only: no stanza added or removed; the no-silent-vanish invariant (candidates = producing + unparseable) holds. Contract change is limited to the 51 relabeled `reason` strings in index.json + the manifest buckets.
|
Warning Review limit reached
More reviews will be available in 49 minutes. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR extends XMPP codegen to handle tag-discriminated union shapes, refactors response parsing for reuse, deduplicates variant names, and introduces mixin fragment tracking. Union codegen now classifies variants sharing a node tag plus pinned attribute values and emits multi-enum payloads. Fragment lifecycle tracking distinguishes benign mixin-only modules from genuine failures, enabling request-anchored response resolution for cross-module dependencies. ChangesUnion Shape Support and Codegen Integration
Mixin Fragment Lifecycle and Response Resolution
🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 86dd8b6836
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if f.required && (f.method == "child" || f.method.starts_with("attr")) { | ||
| s.insert(format!("REQ:{}", f.name)); |
There was a problem hiding this comment.
Exclude non-failing repeated children from union signatures
When a tag-discriminated arm has a required repeated child, the generated parser only builds a Vec from get_children_by_tag and succeeds with an empty vector if the child is absent, so it is not a fail-on-absent discriminator. Counting every required child here lets the separability gate accept shapes where an earlier arm actually matches unconditionally; for example the generated newsletter views-count union has a first arm with a required repeated views_count child and a later deprecated views_count/count arm, but the first parser succeeds even without that child and shadows the deprecated arm, dropping the count.
Useful? React with 👍 / 👎.
…dex P1) variant_signature counted every required child as a fail-on-absent discriminator, but a repeated child reads as a possibly-empty Vec — the parser never bails when it is absent. So an arm whose only required field is a repeated child matched unconditionally and shadowed later arms, and the separability gate wrongly accepted it. Concretely the newsletter views-count union (required repeated `views_count` vs a deprecated `count` attr) was emitted with the deprecated arm unreachable. Skip repeated children (and their contents) when building the signature, so such a shape is correctly rejected (dropped, not misclassified). Drops the 2 views-count unions (116 -> 114 enums); the rest are unaffected. Pure codegen: contract unchanged, regenerated iq.rs compiles clean against the real wacore crate. Adds a unit test.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 63503176d4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if f.field_type == ParsedFieldType::Union { | ||
| if f.required { | ||
| s.insert(format!("NESTED:{}", f.name)); | ||
| } |
There was a problem hiding this comment.
Exclude nested unions from discriminator signatures
When an earlier tag-discriminated arm is distinguished only by a required nested type=union, this signature lets classification accept it as fail-on-absent, but emit_union_read always produces an Option<Enum> and returns None for an absent or unrecognized nested union rather than bailing. In that scenario the earlier arm's per-variant parser still succeeds with the nested union set to None, so it shadows later arms even though the separability gate accepted the shape.
Useful? React with 👍 / 👎.
Closes the known IQ extraction/codegen gaps surfaced after #2 — but first re-frames them: a parallel diagnosis found the alarming headline counts were mostly benign noise, not lost data. This recovers what's genuinely recoverable and makes the rest honest.
Recoveries (each: gate green + the regenerated reference
.rscompiles clean against the consumer API)INIT_FIELDregex captured the init value (tag: tag_items,) instead of the key, so 8 specs wrongly fell back to().()responses 39 → 31.if/returnof branch mixins, no direct<iq>) was dropped, so its branches'xmlns/typenever reached the request. Recovers the GroupProfilePicturesw:g2stanza and enriches the PushConfig set request. Also dedups same-named request variant names (a latent bug the enrichment exposed: sixplatform-led<config>alternatives all collapsed to one enum variant).…ResponseServerResponse, not RPC/ResponseSuccess) now resolves by anchoring on the exact op name. +1 typed (Pingw:p).UnionShape::TagDiscriminated: each variant becomes a newtype enum arm over its own generated struct (recursively, incl. nested children/unions); the parser tries them first-success, gated by pinned attr values + required fields, with a separability gate that drops shapes where an earlier arm would shadow a later one rather than misclassify. Union enums 10 → 116, including the GroupInfoparticipant(Admin/NonAdmin) and newslettermessage(text/media/poll/…) unions.merge…Mixincombinators (partial<iq>folded into real requests by design; their data already lives in the concrete consumers). AddDropReason::MixinFragmentso the headline reads 6 unparseable (+ 51 benign mixin fragments) instead of a misleading 58.Numbers (vs the post-#2 baseline)
()responses 39 → 31What is deliberately left (and why it's correct)
The diagnosis established most of the remaining "gap" is not loss: 51 benign mixin fragments + 5 non-request IQs + 7 legitimate set/ack responses (
()is the faithful model) + a few protobuf/stateful responses (not representable in the declarative IR) + 5 same-tag unions that aren't separable (e.g.Error == ErrorFallback) — the separability gate drops those on purpose, the same fall-back-not-misclassify stance as the outcome-union path.Contract
Mostly codegen-only (the reference
.rsis gitignored): the IR contract changes only where the extractor genuinely improved — the GroupProfilePictures stanza + PushConfig children + Ping response, and the 51 relabeledreasonstrings.whatspec update --checkstays idempotent; schema validation passes.Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Improvements