Skip to content

Add WAM (analytics/metrics) domain: extractor + IR contract + typed event emitters#4

Merged
jlucaso1 merged 3 commits into
mainfrom
wam-domain
Jun 8, 2026
Merged

Add WAM (analytics/metrics) domain: extractor + IR contract + typed event emitters#4
jlucaso1 merged 3 commits into
mainfrom
wam-domain

Conversation

@jlucaso1

@jlucaso1 jlucaso1 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Adds WAM (WhatsApp analytics/metrics) as a first-class extracted domain, alongside iq/proto/mex/enums.

WAM is a declarative catalog: WAWebWamCodegenUtils.defineEvents({Name: [code, {field: [id, type]}, weights, channel?, privateStatsId?]}) across ~425 WAWeb…WamEvent modules, where each field's type is one of five base kinds (boolean/integer/number/string/timer) or a reference to one of ~850 WAWebWamEnum… enums. The wire format (a byte-exact little-endian codec, ported from WAWebWamLibProtocol) lets a consumer emit events exactly like the web client.

What's added

  • wa-irWamIr (events + enums): the cross-language contract. Pure serde (verified WASM-safe), so any language can consume it; plus an emitted JSON Schema (wam.schema.json) via the existing schemas() registry.
  • wa-wam — extractor (oxc): parses each event's code, fields (id + type), sampling weights, channel, and privateStatsId; resolves enum-typed fields to their defining module; and records each event's consumer modules (from the dependency graph) as an emission hint. Extracts 425 events + 849 referenced enums from a real bundle.
  • whatspec update: the domain is wired into the pipeline (artifacts, the wamEvents manifest count, schema validation, --check). Existing domains stay byte-identical (the extractors are order-independent).
  • wa-codegen — reference wam.rs (gitignored, like iq.rs): a self-contained reference consumer —
    • a stable wire codec (WamBuffer: byte-exact port of the WAM protocol, little-endian) that is hand-written and version-independent;
    • one typed struct per event whose emit serializes it byte-for-byte like the web client (fields are Option<T> typed by the schema, so invalid values are unrepresentable);
    • the enums as #[repr(i64)] types;
    • a doc comment per event naming the modules that emit it, so a consumer knows where each metric is triggered.

The split is deliberate: the IR is the generic contract (each target generates its own bindings), the codec is stable (written once per target), and only the catalog regenerates per WhatsApp version.

Correctness

  • The full generated wam.rs (425 events + 849 enums + codec) compiles cleanly under rustc — type-checked, not just syntax-checked.
  • A golden byte test emits a known event and asserts the exact wire bytes against the decoded WAM spec.
  • 6 schemas validate; whatspec update --check is idempotent; wa-ir builds for wasm32; full gate (fmt, clippy -D warnings, cargo test --workspace) is green.

Contract changes

Additive only: a new wam/index.json + schema/wam.schema.json, and the wam domain + wamEvents count in the manifest. No existing domain artifact changes.

Summary by CodeRabbit

  • New Features
    • Added extraction and code generation for WhatsApp Web analytics event catalogs
    • Generated Rust codec implementations for analytics events with typed field definitions and sampling weights
    • Extended build pipeline validation to include analytics schema conformance checks

…mitters

WhatsApp Web's WAM telemetry is a declarative catalog: `defineEvents({Name:[code,
{field:[id,type]}, weights, channel?, psId?]})` in ~425 `WAWeb…WamEvent` modules,
with field types being 5 base kinds or refs to ~850 `WAWebWamEnum…` enums. This adds
it as a first-class domain alongside iq/proto/mex/enums.

- wa-ir: `WamIr` (events + enums) — the cross-language contract (serde-only, verified
  WASM-safe), + JSON Schema (`wam.schema.json`) via the existing `schemas()` registry.
- wa-wam: oxc extractor. Parses each event's code/fields/weights/channel/privateStatsId,
  resolves enum-typed fields to their defining module, and records each event's consumer
  modules (the dep graph) as an emission hint. Extracts 425 events + 849 referenced enums.
- whatspec: wires the domain into `update` (artifacts, manifest `wamEvents`, schema
  validation, --check) — existing domains are byte-identical (order-independent).
- wa-codegen: a reference `wam.rs` (gitignored) — a STABLE wire codec (byte-exact port
  of WAWebWamLibProtocol, little-endian) + one typed struct per event whose `emit`
  serializes it exactly like WA Web, + the enums (`#[repr(i64)]`), + a doc per event
  naming the WA Web modules that emit it. The codec is hand-written (version-stable);
  only the catalog regenerates. A host (e.g. another WhatsApp lib) builds typed events
  and emits correct bytes.

Verified: full generated wam.rs (425 events, 849 enums + codec) compiles clean with
rustc; a golden test asserts the codec's exact wire bytes against the decoded WAM
spec; 6 schemas validate; --check idempotent; gate green.
@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@jlucaso1, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 31 minutes and 16 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 5419e73a-794e-4abb-afc0-bb7fd76c0dd0

📥 Commits

Reviewing files that changed from the base of the PR and between 9ee2bd7 and 6aef8a5.

📒 Files selected for processing (1)
  • crates/whatspec/src/main.rs
📝 Walkthrough

Walkthrough

This PR adds comprehensive support for the WAM domain—WhatsApp analytics event catalogs—across the extraction, IR, codegen, and orchestration layers. It defines the WAM IR schema, extracts WAM catalogs from JavaScript source via AST parsing, generates Rust wire codec and event types, and integrates the complete pipeline into whatspec's artifact build and validation system.

Changes

WAM Domain Support

Layer / File(s) Summary
WAM IR data structures and types
crates/wa-ir/src/wam.rs, crates/wa-ir/src/lib.rs
Introduces serializable WamFieldType (base primitives and enum references), WamField, WamEvent (with optional private_stats_id and consumers), WamEnumVariant, WamEnum, and top-level WamIr document. Exposes module publicly and adds schema generation for the WAM domain (6 schemas instead of 5).
WAM extraction from JavaScript AST
crates/wa-wam/src/lib.rs, crates/wa-wam/Cargo.toml
Extracts WhatsApp Web WAM catalogs via oxc AST parsing: parses defineEvents calls into WamEvent entries, resolves referenced WAWebWamEnum modules into WamEnum definitions, performs dependency mapping for consumer modules, and deduplicates results by stable keys.
WAM extraction validation tests
crates/wa-wam/src/tests.rs
Tests extraction correctness: event metadata with enum field resolution and consumer capture, channel/private_stats_id defaults and overrides, filtering (skips non-codegen events and unreferenced enums).
WAM code generation and wire codec
crates/wa-codegen/src/wam.rs, crates/wa-codegen/src/lib.rs, crates/wa-codegen/src/naming.rs
Generates self-contained wam.rs source: PascalCase enums with repr(i64) and variant deduplication, typed event structs with Option fields and WamEventDef trait impl for wire serialization, embedded reference WAM wire codec (WamChannel, WamWire, WamBuffer). Includes golden byte test and Rust syntax validity test. Exports ensure_ident helper for use by codegen.
Workspace configuration and whatspec integration
Cargo.toml, crates/whatspec/Cargo.toml, crates/whatspec/src/main.rs, scripts/validate-schemas.py
Adds wa-wam to workspace members and dependencies. Extends whatspec build_artifacts with concurrent WAM extractor, zero-extraction guard, deterministic artifact order. Populates manifest with wam/index.json and wamEvents count field. Adds push_wam generator for index.json and wam.rs artifacts. Extends regression guard to detect wamEvents floor shrinkage. Updates schema validation to include WAM domain.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 The whitespace whispers of analytics true,
WAM events marshaled through AST's morning dew,
From WhatsApp's depths, enum types take their flight,
Wire codec stitched, in perfect Rust-bright light,
A new domain blooms in whatspec's grand sight!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main changes: adding a new WAM domain with an extractor, IR contract, and typed event emitters. It accurately reflects the substantial additions across multiple crates.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch wam-domain

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab9e1b3100

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

appstate_actions: usize,
abprops_configs: usize,
enum_defs: usize,
wam_events: usize,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add WAM to the floor guard

Now that wam_events is recorded in the manifest, it also needs to be included in check_floor's top-level checks array. Otherwise a future extractor regression that drops most WAM events (but not all of them, so the zero-count guard does not fire) will still overwrite the committed artifacts even though every other generated domain is protected against this exact shrink scenario.

Useful? React with 👍 / 👎.

…s, enum fallback

Adversarial review of the WAM domain (+ Codex P2):

- Default sample weight: the codegen baked `weights[0]` as the event weight, but the
  web runtime never reads `weights[0]` — it defaults the weight to the literal `1` and
  only reads `weights[1]`/`weights[2]` when a sampling-ring gate is on. So `emit` now
  uses 1 (matching the un-ringed client; e.g. the `[100,100,10000]` test event emitted
  -100 instead of -1). The raw ring weights are exposed as a per-event `SAMPLE_WEIGHTS`
  const for a host that implements ring selection.
- Floor guard (Codex P2): add `wamEvents` to `check_floor`'s checks, so a partial WAM
  regression (most-but-not-all events dropped) can't silently overwrite the artifacts.
- Framing globals: expose the field ids the web client writes per buffer —
  TIMESTAMP_FIELD(47), SEQUENCE_FIELD(3433), PRIVATE_STATS_FIELD(6005) — plus
  `write_timestamp`/`write_sequence`/`write_private_stats_id` helpers, so a host can
  assemble a complete, parseable buffer (not just bare events).
- Defensive: `field_wire` now stays consistent with `field_rust_type` — an enum whose
  module didn't resolve degrades to `Option<i64>` and is passed through as `WamWire::Int`
  rather than calling `as_wam_int` on an `i64` (a latent compile break).

Pure codegen + guard: IR/contract unchanged. Regenerated wam.rs compiles clean under
rustc; golden byte test + 6 schemas + --check + full gate green.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9ee2bd7e66

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1125 to +1127
artifacts.push(Artifact {
rel_path: PathBuf::from("wam/wam.rs"),
content: wa_codegen::generate_wam(&ir),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Commit the generated WAM Rust artifact

When whatspec update --check is run against the committed generated artifacts, this new artifact is always part of the in-memory set, but generated/wam/wam.rs is not present in the commit (git ls-files generated/wam only shows index.json). check_artifacts will therefore report wam/wam.rs (missing) even immediately after this PR is checked out, so either commit the generated file or stop adding it to the artifact list.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/whatspec/src/main.rs (2)

284-293: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include wamEvents in the diff command count comparison.

The diff command compares domain counts but omits wamEvents, even though it's tracked in the manifest (line 828). Users won't see WAM event count changes when running whatspec diff.

📝 Suggested fix to include wamEvents in diff
     for key in [
         "iqModules",
         "protoEntities",
         "mexOperations",
         "appstateActions",
         "abPropsConfigs",
         "enumDefs",
+        "wamEvents",
     ] {
         print_count_delta(key, json_u64(&mo, key), json_u64(&mn, key));
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/whatspec/src/main.rs` around lines 284 - 293, The diff output omits
the "wamEvents" domain; update the array in the loop that calls
print_count_delta(key, json_u64(&mo, key), json_u64(&mn, key)) to include
"wamEvents" so its counts are compared too; locate the array of keys in main.rs
(where print_count_delta and json_u64 are invoked with mo and mn) and add
"wamEvents" alongside the other domain keys.

177-187: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include WAM events count in the success message.

The success message lists counts for all other domains but omits wam_events, even though WAM extraction is fully integrated and validated. This reduces visibility and creates an inconsistency with the other domains.

📝 Suggested fix to include WAM events count
     eprintln!(
-        "wrote artifacts to {}: {} iq modules, {} proto entities, {} mex ops, {} appstate actions, \
-         {} abprops, {} enums",
+        "wrote artifacts to {}: {} iq modules, {} proto entities, {} mex ops, {} appstate actions, \
+         {} abprops, {} enums, {} wam events",
         opts.out.display(),
         counts.iq_modules,
         counts.proto_entities,
         counts.mex_ops,
         counts.appstate_actions,
         counts.abprops_configs,
-        counts.enum_defs
+        counts.enum_defs,
+        counts.wam_events
     );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/whatspec/src/main.rs` around lines 177 - 187, The success eprintln in
main.rs currently prints counts for iq_modules, proto_entities, mex_ops,
appstate_actions, abprops_configs and enum_defs but omits the WAM events count;
update the eprintln format string to include counts.wam_events and add
counts.wam_events to the argument list (the same eprintln call that references
opts.out.display() and counts.*) so the final message shows the WAM events count
alongside the others.
🧹 Nitpick comments (1)
crates/wa-codegen/src/wam.rs (1)

109-129: 💤 Low value

Consider removing unused variants collection.

The variants vector is built but never used — the enum definition is written directly in the loop at line 122. The let _ = variants; suppresses the warning but leaves dead code.

🧹 Suggested cleanup
 fn emit_enum(out: &mut String, e: &WamEnum, name: &str) {
     out.push_str(&format!(
         "/// WAM enum `{}` (module `{}`).\n",
         e.name, e.module
     ));
     out.push_str("#[derive(Debug, Clone, Copy, PartialEq, Eq)]\n#[repr(i64)]\n");
     out.push_str(&format!("pub enum {name} {{\n"));
     // Variant idents must be unique within the enum (two keys can PascalCase the same).
     let mut vused: std::collections::HashSet<String> = std::collections::HashSet::new();
-    let mut variants: Vec<(String, i64)> = Vec::new();
     for v in &e.variants {
         let mut id = wam_pascal(&v.key);
         if id.is_empty() || id.chars().next().is_some_and(|c| c.is_ascii_digit()) {
             id = format!("V{}", v.value);
         }
         let base = id.clone();
         let mut n = 2;
         while !vused.insert(id.clone()) {
             id = format!("{base}{n}");
             n += 1;
         }
         out.push_str(&format!("    {id} = {},\n", v.value));
-        variants.push((id, v.value));
     }
     out.push_str("}\n\n");
     out.push_str(&format!(
         "impl {name} {{\n    pub fn as_wam_int(self) -> i64 {{ self as i64 }}\n}}\n\n"
     ));
-    let _ = variants;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/wa-codegen/src/wam.rs` around lines 109 - 129, The code builds a
`variants: Vec<(String, i64)>` but never uses it; remove the dead collection and
its references by deleting the `let mut variants: Vec<(String, i64)> =
Vec::new();` declaration, removing the `variants.push((id, v.value));` inside
the loop, and deleting the trailing `let _ = variants;` statement; keep the enum
string construction (the loop that calls `wam_pascal(&v.key)`, uses `vused`, and
writes with `out.push_str`) intact so only the unused `variants` bookkeeping is
removed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/whatspec/src/main.rs`:
- Around line 284-293: The diff output omits the "wamEvents" domain; update the
array in the loop that calls print_count_delta(key, json_u64(&mo, key),
json_u64(&mn, key)) to include "wamEvents" so its counts are compared too;
locate the array of keys in main.rs (where print_count_delta and json_u64 are
invoked with mo and mn) and add "wamEvents" alongside the other domain keys.
- Around line 177-187: The success eprintln in main.rs currently prints counts
for iq_modules, proto_entities, mex_ops, appstate_actions, abprops_configs and
enum_defs but omits the WAM events count; update the eprintln format string to
include counts.wam_events and add counts.wam_events to the argument list (the
same eprintln call that references opts.out.display() and counts.*) so the final
message shows the WAM events count alongside the others.

---

Nitpick comments:
In `@crates/wa-codegen/src/wam.rs`:
- Around line 109-129: The code builds a `variants: Vec<(String, i64)>` but
never uses it; remove the dead collection and its references by deleting the
`let mut variants: Vec<(String, i64)> = Vec::new();` declaration, removing the
`variants.push((id, v.value));` inside the loop, and deleting the trailing `let
_ = variants;` statement; keep the enum string construction (the loop that calls
`wam_pascal(&v.key)`, uses `vused`, and writes with `out.push_str`) intact so
only the unused `variants` bookkeeping is removed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c58fec16-d557-4de9-89f3-89ae496d02fe

📥 Commits

Reviewing files that changed from the base of the PR and between ecd4fb5 and 9ee2bd7.

⛔ Files ignored due to path filters (4)
  • Cargo.lock is excluded by !**/*.lock
  • generated/manifest.json is excluded by !**/generated/**
  • generated/schema/wam.schema.json is excluded by !**/generated/**
  • generated/wam/index.json is excluded by !**/generated/**
📒 Files selected for processing (12)
  • Cargo.toml
  • crates/wa-codegen/src/lib.rs
  • crates/wa-codegen/src/naming.rs
  • crates/wa-codegen/src/wam.rs
  • crates/wa-ir/src/lib.rs
  • crates/wa-ir/src/wam.rs
  • crates/wa-wam/Cargo.toml
  • crates/wa-wam/src/lib.rs
  • crates/wa-wam/src/tests.rs
  • crates/whatspec/Cargo.toml
  • crates/whatspec/src/main.rs
  • scripts/validate-schemas.py

Both addressed CodeRabbit nits: the success message and the `diff`
command's per-domain count loop listed every other domain but omitted
the WAM event count, even though it's tracked in the manifest. Add
`wamEvents` to both so WAM stays consistent with iq/proto/mex/appstate/
abprops/enums.
@jlucaso1 jlucaso1 merged commit 9677a34 into main Jun 8, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant