Add WAM (analytics/metrics) domain: extractor + IR contract + typed event emitters#4
Conversation
…mitters
WhatsApp Web's WAM telemetry is a declarative catalog: `defineEvents({Name:[code,
{field:[id,type]}, weights, channel?, psId?]})` in ~425 `WAWeb…WamEvent` modules,
with field types being 5 base kinds or refs to ~850 `WAWebWamEnum…` enums. This adds
it as a first-class domain alongside iq/proto/mex/enums.
- wa-ir: `WamIr` (events + enums) — the cross-language contract (serde-only, verified
WASM-safe), + JSON Schema (`wam.schema.json`) via the existing `schemas()` registry.
- wa-wam: oxc extractor. Parses each event's code/fields/weights/channel/privateStatsId,
resolves enum-typed fields to their defining module, and records each event's consumer
modules (the dep graph) as an emission hint. Extracts 425 events + 849 referenced enums.
- whatspec: wires the domain into `update` (artifacts, manifest `wamEvents`, schema
validation, --check) — existing domains are byte-identical (order-independent).
- wa-codegen: a reference `wam.rs` (gitignored) — a STABLE wire codec (byte-exact port
of WAWebWamLibProtocol, little-endian) + one typed struct per event whose `emit`
serializes it exactly like WA Web, + the enums (`#[repr(i64)]`), + a doc per event
naming the WA Web modules that emit it. The codec is hand-written (version-stable);
only the catalog regenerates. A host (e.g. another WhatsApp lib) builds typed events
and emits correct bytes.
Verified: full generated wam.rs (425 events, 849 enums + codec) compiles clean with
rustc; a golden test asserts the codec's exact wire bytes against the decoded WAM
spec; 6 schemas validate; --check idempotent; gate green.
|
Warning Review limit reached
More reviews will be available in 31 minutes and 16 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds comprehensive support for the WAM domain—WhatsApp analytics event catalogs—across the extraction, IR, codegen, and orchestration layers. It defines the WAM IR schema, extracts WAM catalogs from JavaScript source via AST parsing, generates Rust wire codec and event types, and integrates the complete pipeline into whatspec's artifact build and validation system. ChangesWAM Domain Support
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ab9e1b3100
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| appstate_actions: usize, | ||
| abprops_configs: usize, | ||
| enum_defs: usize, | ||
| wam_events: usize, |
There was a problem hiding this comment.
Now that wam_events is recorded in the manifest, it also needs to be included in check_floor's top-level checks array. Otherwise a future extractor regression that drops most WAM events (but not all of them, so the zero-count guard does not fire) will still overwrite the committed artifacts even though every other generated domain is protected against this exact shrink scenario.
Useful? React with 👍 / 👎.
…s, enum fallback Adversarial review of the WAM domain (+ Codex P2): - Default sample weight: the codegen baked `weights[0]` as the event weight, but the web runtime never reads `weights[0]` — it defaults the weight to the literal `1` and only reads `weights[1]`/`weights[2]` when a sampling-ring gate is on. So `emit` now uses 1 (matching the un-ringed client; e.g. the `[100,100,10000]` test event emitted -100 instead of -1). The raw ring weights are exposed as a per-event `SAMPLE_WEIGHTS` const for a host that implements ring selection. - Floor guard (Codex P2): add `wamEvents` to `check_floor`'s checks, so a partial WAM regression (most-but-not-all events dropped) can't silently overwrite the artifacts. - Framing globals: expose the field ids the web client writes per buffer — TIMESTAMP_FIELD(47), SEQUENCE_FIELD(3433), PRIVATE_STATS_FIELD(6005) — plus `write_timestamp`/`write_sequence`/`write_private_stats_id` helpers, so a host can assemble a complete, parseable buffer (not just bare events). - Defensive: `field_wire` now stays consistent with `field_rust_type` — an enum whose module didn't resolve degrades to `Option<i64>` and is passed through as `WamWire::Int` rather than calling `as_wam_int` on an `i64` (a latent compile break). Pure codegen + guard: IR/contract unchanged. Regenerated wam.rs compiles clean under rustc; golden byte test + 6 schemas + --check + full gate green.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9ee2bd7e66
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| artifacts.push(Artifact { | ||
| rel_path: PathBuf::from("wam/wam.rs"), | ||
| content: wa_codegen::generate_wam(&ir), |
There was a problem hiding this comment.
Commit the generated WAM Rust artifact
When whatspec update --check is run against the committed generated artifacts, this new artifact is always part of the in-memory set, but generated/wam/wam.rs is not present in the commit (git ls-files generated/wam only shows index.json). check_artifacts will therefore report wam/wam.rs (missing) even immediately after this PR is checked out, so either commit the generated file or stop adding it to the artifact list.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/whatspec/src/main.rs (2)
284-293:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winInclude
wamEventsin the diff command count comparison.The diff command compares domain counts but omits
wamEvents, even though it's tracked in the manifest (line 828). Users won't see WAM event count changes when runningwhatspec diff.📝 Suggested fix to include wamEvents in diff
for key in [ "iqModules", "protoEntities", "mexOperations", "appstateActions", "abPropsConfigs", "enumDefs", + "wamEvents", ] { print_count_delta(key, json_u64(&mo, key), json_u64(&mn, key)); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/whatspec/src/main.rs` around lines 284 - 293, The diff output omits the "wamEvents" domain; update the array in the loop that calls print_count_delta(key, json_u64(&mo, key), json_u64(&mn, key)) to include "wamEvents" so its counts are compared too; locate the array of keys in main.rs (where print_count_delta and json_u64 are invoked with mo and mn) and add "wamEvents" alongside the other domain keys.
177-187:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winInclude WAM events count in the success message.
The success message lists counts for all other domains but omits
wam_events, even though WAM extraction is fully integrated and validated. This reduces visibility and creates an inconsistency with the other domains.📝 Suggested fix to include WAM events count
eprintln!( - "wrote artifacts to {}: {} iq modules, {} proto entities, {} mex ops, {} appstate actions, \ - {} abprops, {} enums", + "wrote artifacts to {}: {} iq modules, {} proto entities, {} mex ops, {} appstate actions, \ + {} abprops, {} enums, {} wam events", opts.out.display(), counts.iq_modules, counts.proto_entities, counts.mex_ops, counts.appstate_actions, counts.abprops_configs, - counts.enum_defs + counts.enum_defs, + counts.wam_events );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/whatspec/src/main.rs` around lines 177 - 187, The success eprintln in main.rs currently prints counts for iq_modules, proto_entities, mex_ops, appstate_actions, abprops_configs and enum_defs but omits the WAM events count; update the eprintln format string to include counts.wam_events and add counts.wam_events to the argument list (the same eprintln call that references opts.out.display() and counts.*) so the final message shows the WAM events count alongside the others.
🧹 Nitpick comments (1)
crates/wa-codegen/src/wam.rs (1)
109-129: 💤 Low valueConsider removing unused
variantscollection.The
variantsvector is built but never used — the enum definition is written directly in the loop at line 122. Thelet _ = variants;suppresses the warning but leaves dead code.🧹 Suggested cleanup
fn emit_enum(out: &mut String, e: &WamEnum, name: &str) { out.push_str(&format!( "/// WAM enum `{}` (module `{}`).\n", e.name, e.module )); out.push_str("#[derive(Debug, Clone, Copy, PartialEq, Eq)]\n#[repr(i64)]\n"); out.push_str(&format!("pub enum {name} {{\n")); // Variant idents must be unique within the enum (two keys can PascalCase the same). let mut vused: std::collections::HashSet<String> = std::collections::HashSet::new(); - let mut variants: Vec<(String, i64)> = Vec::new(); for v in &e.variants { let mut id = wam_pascal(&v.key); if id.is_empty() || id.chars().next().is_some_and(|c| c.is_ascii_digit()) { id = format!("V{}", v.value); } let base = id.clone(); let mut n = 2; while !vused.insert(id.clone()) { id = format!("{base}{n}"); n += 1; } out.push_str(&format!(" {id} = {},\n", v.value)); - variants.push((id, v.value)); } out.push_str("}\n\n"); out.push_str(&format!( "impl {name} {{\n pub fn as_wam_int(self) -> i64 {{ self as i64 }}\n}}\n\n" )); - let _ = variants; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/wa-codegen/src/wam.rs` around lines 109 - 129, The code builds a `variants: Vec<(String, i64)>` but never uses it; remove the dead collection and its references by deleting the `let mut variants: Vec<(String, i64)> = Vec::new();` declaration, removing the `variants.push((id, v.value));` inside the loop, and deleting the trailing `let _ = variants;` statement; keep the enum string construction (the loop that calls `wam_pascal(&v.key)`, uses `vused`, and writes with `out.push_str`) intact so only the unused `variants` bookkeeping is removed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/whatspec/src/main.rs`:
- Around line 284-293: The diff output omits the "wamEvents" domain; update the
array in the loop that calls print_count_delta(key, json_u64(&mo, key),
json_u64(&mn, key)) to include "wamEvents" so its counts are compared too;
locate the array of keys in main.rs (where print_count_delta and json_u64 are
invoked with mo and mn) and add "wamEvents" alongside the other domain keys.
- Around line 177-187: The success eprintln in main.rs currently prints counts
for iq_modules, proto_entities, mex_ops, appstate_actions, abprops_configs and
enum_defs but omits the WAM events count; update the eprintln format string to
include counts.wam_events and add counts.wam_events to the argument list (the
same eprintln call that references opts.out.display() and counts.*) so the final
message shows the WAM events count alongside the others.
---
Nitpick comments:
In `@crates/wa-codegen/src/wam.rs`:
- Around line 109-129: The code builds a `variants: Vec<(String, i64)>` but
never uses it; remove the dead collection and its references by deleting the
`let mut variants: Vec<(String, i64)> = Vec::new();` declaration, removing the
`variants.push((id, v.value));` inside the loop, and deleting the trailing `let
_ = variants;` statement; keep the enum string construction (the loop that calls
`wam_pascal(&v.key)`, uses `vused`, and writes with `out.push_str`) intact so
only the unused `variants` bookkeeping is removed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: c58fec16-d557-4de9-89f3-89ae496d02fe
⛔ Files ignored due to path filters (4)
Cargo.lockis excluded by!**/*.lockgenerated/manifest.jsonis excluded by!**/generated/**generated/schema/wam.schema.jsonis excluded by!**/generated/**generated/wam/index.jsonis excluded by!**/generated/**
📒 Files selected for processing (12)
Cargo.tomlcrates/wa-codegen/src/lib.rscrates/wa-codegen/src/naming.rscrates/wa-codegen/src/wam.rscrates/wa-ir/src/lib.rscrates/wa-ir/src/wam.rscrates/wa-wam/Cargo.tomlcrates/wa-wam/src/lib.rscrates/wa-wam/src/tests.rscrates/whatspec/Cargo.tomlcrates/whatspec/src/main.rsscripts/validate-schemas.py
Both addressed CodeRabbit nits: the success message and the `diff` command's per-domain count loop listed every other domain but omitted the WAM event count, even though it's tracked in the manifest. Add `wamEvents` to both so WAM stays consistent with iq/proto/mex/appstate/ abprops/enums.
Adds WAM (WhatsApp analytics/metrics) as a first-class extracted domain, alongside iq/proto/mex/enums.
WAM is a declarative catalog:
WAWebWamCodegenUtils.defineEvents({Name: [code, {field: [id, type]}, weights, channel?, privateStatsId?]})across ~425WAWeb…WamEventmodules, where each field's type is one of five base kinds (boolean/integer/number/string/timer) or a reference to one of ~850WAWebWamEnum…enums. The wire format (a byte-exact little-endian codec, ported fromWAWebWamLibProtocol) lets a consumer emit events exactly like the web client.What's added
wa-ir—WamIr(events + enums): the cross-language contract. Pure serde (verified WASM-safe), so any language can consume it; plus an emitted JSON Schema (wam.schema.json) via the existingschemas()registry.wa-wam— extractor (oxc): parses each event's code, fields (id + type), sampling weights, channel, andprivateStatsId; resolves enum-typed fields to their defining module; and records each event's consumer modules (from the dependency graph) as an emission hint. Extracts 425 events + 849 referenced enums from a real bundle.whatspec update: the domain is wired into the pipeline (artifacts, thewamEventsmanifest count, schema validation,--check). Existing domains stay byte-identical (the extractors are order-independent).wa-codegen— referencewam.rs(gitignored, likeiq.rs): a self-contained reference consumer —WamBuffer: byte-exact port of the WAM protocol, little-endian) that is hand-written and version-independent;emitserializes it byte-for-byte like the web client (fields areOption<T>typed by the schema, so invalid values are unrepresentable);#[repr(i64)]types;The split is deliberate: the IR is the generic contract (each target generates its own bindings), the codec is stable (written once per target), and only the catalog regenerates per WhatsApp version.
Correctness
wam.rs(425 events + 849 enums + codec) compiles cleanly underrustc— type-checked, not just syntax-checked.whatspec update --checkis idempotent;wa-irbuilds forwasm32; full gate (fmt, clippy-D warnings,cargo test --workspace) is green.Contract changes
Additive only: a new
wam/index.json+schema/wam.schema.json, and thewamdomain +wamEventscount in the manifest. No existing domain artifact changes.Summary by CodeRabbit