diff --git a/CHANGELOG.md b/CHANGELOG.md index ffba896..ee863a1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,8 +6,41 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), ## [Unreleased] +### Added + +- **Vtable reflection mode.** Generated types now implement + `buffa_descriptor::reflect::ReflectMessage` directly — on both the owned + structs and the zero-copy view types — so `foo.reflect()` borrows `foo` in + place (`ReflectCow::Borrowed`) with no encode/decode round-trip and no + per-field allocation. This is the path a CEL evaluator, transcoding gateway, or + generic interceptor takes to read fields by descriptor; reflecting a decoded + view runs several times faster than the previous bridge round-trip. Select the + mode with the new `buffa_build::ReflectMode` enum: + + ```rust + buffa_build::Config::new() + .reflect_mode(buffa_build::ReflectMode::VTable) // or ::Bridge / ::Off + .compile()?; + ``` + + The `protoc-gen-buffa` equivalent is `reflect_mode=off|bridge|vtable`. Vtable + mode does not require view generation: with views off, only the owned + `ReflectMessage` is emitted. +- **`buffa-types` `reflect` feature.** Well-known types (`Timestamp`, + `Duration`, `Struct`/`Value`, `Any`, wrappers, …) now implement + `ReflectMessage`, so messages that embed WKTs reflect end to end. +- **`ReflectElement` for the configurable `string_type` representations** + (`SmolStr`, `EcoString`, `CompactString`), gated behind the matching + `buffa-descriptor` feature, so a `repeated ` field reflects in vtable + mode. + ### Changed +- **`generate_reflection(true)` now selects vtable mode** (previously bridge). + The reflective API is unchanged (`foo.reflect().get(fd)`), so call sites do not + change, but generated code grows by one `impl ReflectMessage` per type. Opt + back into the smaller round-trip implementation with + `reflect_mode(ReflectMode::Bridge)`. - **`use_bytes_type()` / `use_bytes_type_in(...)` now applies to `map` values (#76).** Previously map values were always `Vec` regardless of config — the only `bytes`-context not covered. They now match the type used diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 0cec31b..8df9b27 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -96,12 +96,13 @@ CI (`check-generated-code` job) will fail if checked-in generated code is stale. ## Cross-Target Checks -Run `task install-targets` to install the additional rustup targets needed by cross-target tasks. The targets are: +`task check-nostd` adds the bare-metal `thumbv7em-none-eabihf` target on demand (it runs `rustup target add` itself, which is idempotent), so it needs no separate setup. + +For the 32-bit tasks, run `task install-targets` first to install the additional rustup target: - `i686-unknown-linux-gnu` — 32-bit x86 Linux (for `task check-32bit` / `task test-32bit`; `test-32bit` also needs `gcc-multilib`) -- `thumbv7em-none-eabihf` — bare-metal ARM Cortex-M4 (for the second step of `task check-nostd`) -The tasks have preconditions that print a clear error if the targets are missing. +`task install-targets` also installs `thumbv7em-none-eabihf` for convenience; the `check-32bit` / `test-32bit` tasks have preconditions that print a clear error if the 32-bit target is missing. ## Continuous Integration diff --git a/DESIGN.md b/DESIGN.md index 662f9f0..bc08fcd 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -69,7 +69,7 @@ Both layer on top of the generated `Message` impl via `include!()` + sibling mod ### `buffa-descriptor` — Protobuf Descriptor Types -Self-hosted Rust types for `google/protobuf/descriptor.proto` and `google/protobuf/compiler/plugin.proto`, generated by `buffa-codegen` itself. These are the types that `buffa-codegen` uses to parse protoc's `CodeGeneratorRequest`, and the foundation for future runtime reflection. +Self-hosted Rust types for `google/protobuf/descriptor.proto` and `google/protobuf/compiler/plugin.proto`, generated by `buffa-codegen` itself. These are the types that `buffa-codegen` uses to parse protoc's `CodeGeneratorRequest`. Under the `reflect` feature this crate is also the home of the runtime reflection layer — `DescriptorPool`, `DynamicMessage`, and the `ReflectMessage` trait surface (see [Core Design Decision 11](#11-reflection--bridge-and-vtable-modes)). The generated code is checked in (regenerate via `task gen-bootstrap-types`). The only runtime dependency is `buffa` — no quote/syn/prettyplease — so the crate is `no_std`-capable and dependency-light enough to depend on from the runtime without pulling in the codegen toolchain. @@ -477,6 +477,30 @@ The canonical protobuf JSON mapping is non-trivial and cannot be satisfied by pl - **Well-known types**: each has a bespoke JSON representation defined by the protobuf spec — `Timestamp` as RFC 3339, `Duration` as `"1.5s"`, `FieldMask` as `"a.b,c.d"`, `Value`/`Struct`/`ListValue` as native JSON, wrapper types as their wrapped scalar, `Any` as `{"@type": "...", ...fields}`. These require hand-written `Serialize`/`Deserialize` impls in `buffa-types`. - **Default value omission**: proto3 fields at their default value are omitted from JSON output. +### 11. Reflection — Bridge and Vtable Modes + +Reflection lets code process messages by descriptor rather than by static type — the path a CEL evaluator, a transcoding gateway, a field-mask filter, or a gRPC server-reflection endpoint takes. Buffa exposes one trait surface, `ReflectMessage`, with two sources behind it: a fully dynamic runtime engine, and reflection over generated types. + +**The common surface.** `ReflectMessage` (in `buffa-descriptor`) reads a message through its `MessageDescriptor`: `get(&FieldDescriptor) -> ValueRef`, `has(&FieldDescriptor) -> bool`, `for_each_set(...)`, `to_dynamic()`, and `unknown_fields()`. `ValueRef<'a>` is a *borrowed* field value — scalars by copy, `String(&'a str)` / `Bytes(&'a [u8])` by reference, `Message(ReflectCow<'a>)` for nested messages, and `List`/`Map` as `&dyn ReflectList` / `&dyn ReflectMap` trait objects. Because every value borrows from the message, reading a field allocates nothing. + +**The runtime engine — `DynamicMessage`.** A schema-agnostic message: a `BTreeMap` keyed by field number, plus an `Arc` and the message's `MessageIndex`. It encodes, decodes, and JSON-serializes entirely from descriptor data, with no generated type involved. Generated packages embed their own `FileDescriptorSet` bytes and expose a lazily-built (`OnceLock`) pool as `your_crate::your_pkg::descriptor_pool()`, which all reflection in that package resolves against. + +**Reflection over generated types — two modes.** Generated types implement `Reflectable`, whose `reflect()` returns a `ReflectCow<'a>` — either `Owned(Box)` or `Borrowed(&'a dyn ReflectMessage)`. Codegen emits one of two bodies, selected by `ReflectMode` (`Off` / `Bridge` / `VTable`); the call site (`foo.reflect().get(fd)`) is identical either way, so switching modes is a zero-diff change for consumers. + +| | **Bridge** | **Vtable** (default) | +|---|---|---| +| `reflect()` body | re-encode `self`, decode into a `DynamicMessage`, box it | `ReflectCow::Borrowed(self)` | +| `ReflectMessage` impl | only on `DynamicMessage` | emitted on every owned struct **and** view type | +| Per-call cost | one encode + decode + allocation | a borrow; reads fields in place | +| Generated code size | smaller | one `impl ReflectMessage` per type | +| Requires views | no | no (view impls are added when views exist; the owned impl is self-contained) | + +Vtable mode is what makes reflection cheap enough to put on a hot path: reflecting a decoded view runs several times faster than the bridge round-trip (see [Reflection](README.md#reflection)), because it reuses the zero-copy `decode_view` and never materializes a `DynamicMessage`. + +**Container elements and coherence.** `List`/`Map` values dispatch through `ReflectElement` (element → `ValueRef`) and `ReflectMapKey` (key → `MapKeyRef`), with generic `ReflectList for Vec` / `RepeatedView` and `ReflectMap` impls on top. `ReflectElement` is a *closed set of concrete impls* — scalars, `&str`/`&[u8]`, `String`/`Vec`/`Bytes`, the configurable `string_type` representations, and codegen-emitted impls for each message and closed enum — rather than a blanket `impl ReflectElement for T`, which would collide with the concrete scalar impls under Rust's coherence rules. + +**Placement and validation.** The trait surface, `DynamicMessage`, the pool, and the container impls live in `buffa-descriptor` (feature `reflect`, which requires `std` for the `OnceLock`-backed pool). Codegen lives in `buffa-codegen` — `reflect.rs` (the `Reflectable` body and embedded pool), `reflect_view.rs`, and `reflect_owned.rs`. Both the dynamic codec and the vtable surface are exercised by the conformance suite: the `via-reflect` run drives all I/O through `DynamicMessage`, and the `via-vtable` run decodes a view, walks its `ReflectMessage` surface to rebuild a `DynamicMessage`, and serializes that to JSON — isolating any bug to the generated vtable `get`/`has`/`for_each_set`. + ### Owned decode: intentional throughput trade-offs Owned decode (`Message::decode_from_slice`) benchmarks within roughly ±10% of prost in most cases. The costs are intentional and attributable to specific features: diff --git a/README.md b/README.md index fc5c525..68acc93 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,7 @@ The Rust ecosystem lacks an actively maintained, pure-Rust library that supports - **Unknown field preservation.** Round-trip fidelity for proxy and middleware use cases. -- **Runtime reflection.** `buffa-descriptor` (under the `reflect` feature) provides `DescriptorPool` and `DynamicMessage` for schema-driven encode, decode, and JSON without generated code — plus extensions, custom-option access, `Any` pack/unpack, and symbol→file lookup for gRPC server reflection. Generated types bridge into the same `ReflectMessage` trait via a derived `Reflectable` impl, so a CEL evaluator, transcoding gateway, or generic interceptor can treat typed and dynamic messages uniformly. See [Reflection](#reflection) for the cost relative to generated code. +- **Runtime reflection.** `buffa-descriptor` (under the `reflect` feature) provides `DescriptorPool` and `DynamicMessage` for schema-driven encode, decode, and JSON without generated code — plus extensions, custom-option access, `Any` pack/unpack, and symbol→file lookup for gRPC server reflection. Generated types implement the same `ReflectMessage` trait directly (vtable mode), so `foo.reflect()` borrows in place and a CEL evaluator, transcoding gateway, or generic interceptor treats typed and dynamic messages uniformly — without a re-encode round-trip. See [Reflection](#reflection) for the cost relative to generated code. - **`no_std` + `alloc`.** The core runtime works without `std`, including JSON serialization via serde. Enabling `std` adds `std::io` integration, `std::time` conversions, and thread-local JSON parse options. @@ -293,19 +293,36 @@ structs and then encoding them. ### Reflection -The reflection path (`DynamicMessage`) trades throughput for schema-agnostic processing: a CEL evaluator, a transcoding gateway, or a generic interceptor can encode, decode, and serialize messages it has no generated type for. These charts measure that genericity tax against the generated codec, on the same machine and with the same method as above. Only the four code-generated benchmark messages are covered, because reflection needs a generated type to compare against; `MediaFrame` is omitted. +Reflection lets a CEL evaluator, a transcoding gateway, or a generic interceptor encode, decode, and serialize messages it has no generated type for. buffa offers two implementations, selected with `reflect_mode`: **bridge** keeps generated code small (`foo.reflect()` re-encodes the typed message and decodes the bytes into a `DynamicMessage`), while **vtable** — the default when reflection is enabled — implements `ReflectMessage` directly on the generated types so `foo.reflect()` borrows `foo` in place, with no round-trip. Both hand out the same `&dyn ReflectMessage`, so the call site does not change between modes. -The three series: +These charts measure the genericity tax against the generated codec. Only the four code-generated benchmark messages are covered, because reflection needs a generated type to compare against; `MediaFrame` is omitted. They are regenerated through the Docker benchmark harness, but — unlike the cross-implementation charts above — on the development host rather than the pinned Xeon runner, so read them as a buffa-internal comparison (generated vs. reflect vs. view vs. vtable), not against the numbers in the sections above. -- **generated** — the typed codec `buffa-codegen` emits. Each message is a Rust struct with one field per proto field, and decode/encode are monomorphized to those fields. This is buffa's default path and the same `buffa` baseline charted under [Binary decode](#binary-decode) and [Binary encode](#binary-encode) above. +#### Decode + +- **generated** — the typed codec `buffa-codegen` emits: a Rust struct with one field per proto field, decode monomorphized to those fields. The same `buffa` baseline charted under [Binary decode](#binary-decode). - **reflect** — `DynamicMessage`: a single `BTreeMap` keyed by field number, driven entirely by a runtime `DescriptorPool`. No generated type is involved. -- **bridge round-trip** — what a generic interceptor pays *per message* to view a typed value through reflection. The codegen-derived `Reflectable` impl encodes the typed message and decodes the bytes back into a `DynamicMessage`, then hands it out as `&dyn ReflectMessage`. It is a generated encode plus a reflective decode, so it is always the slowest column. +- **view** — zero-copy `decode_view`: strings and bytes borrow from the input buffer instead of being copied into owned `String`/`Vec`, so it decodes *faster than the generated owned codec*. This is the floor every vtable reflection read builds on. ![Reflection decode — ApiResponse](benchmarks/charts/reflect-decode-api_response.svg) ![Reflection decode — LogRecord](benchmarks/charts/reflect-decode-log_record.svg) ![Reflection decode — AnalyticsEvent](benchmarks/charts/reflect-decode-analytics_event.svg) ![Reflection decode — GoogleMessage1](benchmarks/charts/reflect-decode-google_message1_proto3.svg) +#### Read + +The interceptor / field-mask workload: take a wire payload, obtain a reflective handle, and read every set field. This is where vtable mode pays off — it is dominated by the cheap zero-copy decode, so it runs several times faster than either reflection alternative. + +- **vtable** — `decode_view`, then read through the borrowed `&dyn ReflectMessage`. No round-trip, no per-field allocation. +- **bridge** — decode the owned message, then round-trip it into a `DynamicMessage` (the cost the codegen `Reflectable` paid per call before vtable mode). +- **dynamic** — decode straight into a `DynamicMessage`, no typed step (pure reflection). + +![Reflection read — ApiResponse](benchmarks/charts/reflect-read-api_response.svg) +![Reflection read — LogRecord](benchmarks/charts/reflect-read-log_record.svg) +![Reflection read — AnalyticsEvent](benchmarks/charts/reflect-read-analytics_event.svg) +![Reflection read — GoogleMessage1](benchmarks/charts/reflect-read-google_message1_proto3.svg) + +#### Encode + ![Reflection encode — ApiResponse](benchmarks/charts/reflect-encode-api_response.svg) ![Reflection encode — LogRecord](benchmarks/charts/reflect-encode-log_record.svg) ![Reflection encode — AnalyticsEvent](benchmarks/charts/reflect-encode-analytics_event.svg) @@ -313,12 +330,23 @@ The three series:
Raw decode data (MiB/s, % vs generated) -| Message | generated | reflect | bridge round-trip | +| Message | generated | reflect | view | +|---------|------:|------:|------:| +| ApiResponse | 831 | 320 (−61%) | 1,422 (+71%) | +| LogRecord | 779 | 448 (−42%) | 1,971 (+153%) | +| AnalyticsEvent | 220 | 83 (−62%) | 317 (+44%) | +| GoogleMessage1 | 1,020 | 198 (−81%) | 1,274 (+25%) | + +
+ +
Raw read data (MiB/s, decode + scan all fields, % vs bridge) + +| Message | vtable | bridge | dynamic | |---------|------:|------:|------:| -| ApiResponse | 834 | 323 (−61%) | 243 (−71%) | -| LogRecord | 742 | 447 (−40%) | 364 (−51%) | -| AnalyticsEvent | 221 | 83 (−62%) | 69 (−69%) | -| GoogleMessage1 | 1,022 | 217 (−79%) | 210 (−79%) | +| ApiResponse | 799 (+398%) | 160 | 233 (+46%) | +| LogRecord | 1,462 (+667%) | 191 | 356 (+86%) | +| AnalyticsEvent | 315 (+516%) | 51 | 83 (+62%) | +| GoogleMessage1 | 654 (+351%) | 145 | 153 (+6%) |
@@ -326,10 +354,10 @@ The three series: | Message | generated | reflect | |---------|------:|------:| -| ApiResponse | 2,562 | 685 (−73%) | -| LogRecord | 4,107 | 1,292 (−69%) | -| AnalyticsEvent | 594 | 99 (−83%) | -| GoogleMessage1 | 2,636 | 353 (−87%) | +| ApiResponse | 2,347 | 670 (−71%) | +| LogRecord | 3,689 | 1,232 (−67%) | +| AnalyticsEvent | 573 | 96 (−83%) | +| GoogleMessage1 | 2,222 | 352 (−84%) | diff --git a/Taskfile.yml b/Taskfile.yml index c581317..52f29ad 100644 --- a/Taskfile.yml +++ b/Taskfile.yml @@ -599,18 +599,18 @@ tasks: check-nostd: desc: >- - Check that buffa compiles in no_std + alloc configuration. - The first check uses the host target with no_std features disabled. - The second check targets a bare-metal ARM Cortex-M4 (thumbv7em) - to verify true no_std compatibility with no OS or libc. - preconditions: - - sh: rustup target list --installed | grep -q thumbv7em-none-eabihf - msg: >- - Target thumbv7em-none-eabihf is not installed. - Run 'task install-targets' to install it. - cmds: + Check that buffa, buffa-types, and buffa-descriptor compile in + no_std + alloc configuration — the same crates the CI check-nostd job + covers. The host checks disable the std-only features; the bare-metal + ARM Cortex-M4 (thumbv7em) check verifies true no_std with no OS or libc. + The bare-metal target is added on demand (rustup target add is + idempotent), so this task needs no separate install-targets step. + cmds: + - rustup target add thumbv7em-none-eabihf - cargo check -p buffa --no-default-features - cargo check -p buffa --no-default-features --target thumbv7em-none-eabihf + - cargo check -p buffa-types --no-default-features + - cargo check -p buffa-descriptor --no-default-features # ── Coverage ───────────────────────────────────────────────────────── coverage: diff --git a/benchmarks/charts/generate.py b/benchmarks/charts/generate.py index 86f8299..ee46c59 100644 --- a/benchmarks/charts/generate.py +++ b/benchmarks/charts/generate.py @@ -44,7 +44,12 @@ # Reflection comparison (buffa-only: generated codec vs DynamicMessage). "generated": "#4C78A8", "reflect": "#B279A2", + "view": "#72B7B2", "bridge round-trip": "#9D755D", + # Reflection read (from wire bytes → reflective field reads). + "vtable": "#54A24B", + "bridge": "#9D755D", + "dynamic": "#B279A2", } MESSAGES = ["ApiResponse", "LogRecord", "AnalyticsEvent", "GoogleMessage1", "MediaFrame"] @@ -220,14 +225,23 @@ def build_tables( ("Go", lambda ms, md: _get_go(go, "JsonDecode", md)), ]), ("reflect-decode", [ - ("generated", lambda ms, md: _get_reflect(reflect, md, "decode/generated")), - ("reflect", lambda ms, md: _get_reflect(reflect, md, "decode/reflect")), - ("bridge round-trip", lambda ms, md: _get_reflect(reflect, md, "reflect/bridge_round_trip")), + ("generated", lambda ms, md: _get_reflect(reflect, md, "decode/generated")), + ("reflect", lambda ms, md: _get_reflect(reflect, md, "decode/reflect")), + ("view", lambda ms, md: _get_reflect(reflect, md, "decode/view")), ]), ("reflect-encode", [ ("generated", lambda ms, md: _get_reflect(reflect, md, "encode/generated")), ("reflect", lambda ms, md: _get_reflect(reflect, md, "encode/reflect")), ]), + # From wire bytes to reflective field reads (the interceptor / field-mask + # workload): decode a handle and scan all set fields. vtable borrows a + # decoded view; bridge round-trips through DynamicMessage; dynamic decodes + # straight into DynamicMessage. + ("reflect-read", [ + ("vtable", lambda ms, md: _get_reflect(reflect, md, "reflect/vtable_read_all")), + ("bridge", lambda ms, md: _get_reflect(reflect, md, "reflect/bridge_read_all")), + ("dynamic", lambda ms, md: _get_reflect(reflect, md, "reflect/dynamic_read_all")), + ]), ]: table: dict[str, dict[str, float | None]] = {} for series_name, getter in series_defs: @@ -401,6 +415,7 @@ def generate_readme_tables(tables: dict[str, dict[str, dict[str, float | None]]] "json-decode": ("JSON decode", "buffa"), "reflect-decode": ("Reflection decode", "generated"), "reflect-encode": ("Reflection encode", "generated"), + "reflect-read": ("Reflection read (decode + scan all fields)", "bridge"), } for chart_name, table in tables.items(): @@ -466,8 +481,9 @@ def load_criterion(name: str) -> dict[str, float]: "build-encode": "Build + Binary Encode Throughput (from borrowed source data)", "json-encode": "JSON Encode Throughput", "json-decode": "JSON Decode Throughput", - "reflect-decode": "Reflection Decode Throughput (generated vs DynamicMessage)", + "reflect-decode": "Reflection Decode Throughput (generated vs DynamicMessage vs view)", "reflect-encode": "Reflection Encode Throughput (generated vs DynamicMessage)", + "reflect-read": "Reflection Read Throughput (decode + scan all fields)", } # Per-message SVGs: one file per (chart, message) so each can use its own diff --git a/benchmarks/charts/reflect-decode-analytics_event.svg b/benchmarks/charts/reflect-decode-analytics_event.svg index fd43b6c..0a5d065 100644 --- a/benchmarks/charts/reflect-decode-analytics_event.svg +++ b/benchmarks/charts/reflect-decode-analytics_event.svg @@ -9,31 +9,31 @@ .grid { stroke: #d0d7de; stroke-width: 0.5; } - Reflection Decode Throughput (generated vs DynamicMessage) — AnalyticsEvent + Reflection Decode Throughput (generated vs DynamicMessage vs view) — AnalyticsEvent generated reflect - - bridge round-trip + + view 0 - 60 + 80 - 120 + 160 - 180 + 240 - 240 + 320 - 300 + 400 MiB/s AnalyticsEvent - - 220 - - 83 - - 68 + + 220 + + 83 + + 317 diff --git a/benchmarks/charts/reflect-decode-api_response.svg b/benchmarks/charts/reflect-decode-api_response.svg index 5ceba8c..0300b57 100644 --- a/benchmarks/charts/reflect-decode-api_response.svg +++ b/benchmarks/charts/reflect-decode-api_response.svg @@ -9,31 +9,31 @@ .grid { stroke: #d0d7de; stroke-width: 0.5; } - Reflection Decode Throughput (generated vs DynamicMessage) — ApiResponse + Reflection Decode Throughput (generated vs DynamicMessage vs view) — ApiResponse generated reflect - - bridge round-trip + + view 0 - 180 + 400 - 360 + 800 - 540 + 1,200 - 720 + 1,600 - 900 + 2,000 MiB/s ApiResponse - - 833 - - 322 - - 242 + + 831 + + 320 + + 1,422 diff --git a/benchmarks/charts/reflect-decode-google_message1_proto3.svg b/benchmarks/charts/reflect-decode-google_message1_proto3.svg index 5fd3009..46e2cfa 100644 --- a/benchmarks/charts/reflect-decode-google_message1_proto3.svg +++ b/benchmarks/charts/reflect-decode-google_message1_proto3.svg @@ -9,13 +9,13 @@ .grid { stroke: #d0d7de; stroke-width: 0.5; } - Reflection Decode Throughput (generated vs DynamicMessage) — GoogleMessage1 + Reflection Decode Throughput (generated vs DynamicMessage vs view) — GoogleMessage1 generated reflect - - bridge round-trip + + view 0 @@ -30,10 +30,10 @@ 2,000 MiB/s GoogleMessage1 - - 1,021 - - 217 - - 209 + + 1,019 + + 198 + + 1,273 diff --git a/benchmarks/charts/reflect-decode-log_record.svg b/benchmarks/charts/reflect-decode-log_record.svg index 4a46c09..bf7281b 100644 --- a/benchmarks/charts/reflect-decode-log_record.svg +++ b/benchmarks/charts/reflect-decode-log_record.svg @@ -9,31 +9,31 @@ .grid { stroke: #d0d7de; stroke-width: 0.5; } - Reflection Decode Throughput (generated vs DynamicMessage) — LogRecord + Reflection Decode Throughput (generated vs DynamicMessage vs view) — LogRecord generated reflect - - bridge round-trip + + view 0 - 160 + 400 - 320 + 800 - 480 + 1,200 - 640 + 1,600 - 800 + 2,000 MiB/s LogRecord - - 741 - - 447 - - 364 + + 779 + + 448 + + 1,971 diff --git a/benchmarks/charts/reflect-encode-analytics_event.svg b/benchmarks/charts/reflect-encode-analytics_event.svg index 03f25e8..7074c2f 100644 --- a/benchmarks/charts/reflect-encode-analytics_event.svg +++ b/benchmarks/charts/reflect-encode-analytics_event.svg @@ -28,8 +28,8 @@ 600 MiB/s AnalyticsEvent - - 593 - - 98 + + 573 + + 95 diff --git a/benchmarks/charts/reflect-encode-api_response.svg b/benchmarks/charts/reflect-encode-api_response.svg index c75754e..3ba54af 100644 --- a/benchmarks/charts/reflect-encode-api_response.svg +++ b/benchmarks/charts/reflect-encode-api_response.svg @@ -28,8 +28,8 @@ 3,000 MiB/s ApiResponse - - 2,561 - - 684 + + 2,347 + + 669 diff --git a/benchmarks/charts/reflect-encode-google_message1_proto3.svg b/benchmarks/charts/reflect-encode-google_message1_proto3.svg index b44bdca..e17b8d2 100644 --- a/benchmarks/charts/reflect-encode-google_message1_proto3.svg +++ b/benchmarks/charts/reflect-encode-google_message1_proto3.svg @@ -28,8 +28,8 @@ 3,000 MiB/s GoogleMessage1 - - 2,636 - - 353 + + 2,222 + + 351 diff --git a/benchmarks/charts/reflect-encode-log_record.svg b/benchmarks/charts/reflect-encode-log_record.svg index 70b54f5..8c37c8e 100644 --- a/benchmarks/charts/reflect-encode-log_record.svg +++ b/benchmarks/charts/reflect-encode-log_record.svg @@ -17,19 +17,19 @@ 0 - 1,000 + 800 - 2,000 + 1,600 - 3,000 + 2,400 - 4,000 + 3,200 - 5,000 + 4,000 MiB/s LogRecord - - 4,106 - - 1,292 + + 3,688 + + 1,232 diff --git a/benchmarks/charts/reflect-read-analytics_event.svg b/benchmarks/charts/reflect-read-analytics_event.svg new file mode 100644 index 0000000..fbb3fa2 --- /dev/null +++ b/benchmarks/charts/reflect-read-analytics_event.svg @@ -0,0 +1,39 @@ + + + + Reflection Read Throughput (decode + scan all fields) — AnalyticsEvent + + vtable + + bridge + + dynamic + + 0 + + 80 + + 160 + + 240 + + 320 + + 400 + MiB/s + AnalyticsEvent + + 314 + + 51 + + 82 + diff --git a/benchmarks/charts/reflect-read-api_response.svg b/benchmarks/charts/reflect-read-api_response.svg new file mode 100644 index 0000000..e86fb98 --- /dev/null +++ b/benchmarks/charts/reflect-read-api_response.svg @@ -0,0 +1,39 @@ + + + + Reflection Read Throughput (decode + scan all fields) — ApiResponse + + vtable + + bridge + + dynamic + + 0 + + 160 + + 320 + + 480 + + 640 + + 800 + MiB/s + ApiResponse + + 798 + + 160 + + 233 + diff --git a/benchmarks/charts/reflect-read-google_message1_proto3.svg b/benchmarks/charts/reflect-read-google_message1_proto3.svg new file mode 100644 index 0000000..4ebd847 --- /dev/null +++ b/benchmarks/charts/reflect-read-google_message1_proto3.svg @@ -0,0 +1,39 @@ + + + + Reflection Read Throughput (decode + scan all fields) — GoogleMessage1 + + vtable + + bridge + + dynamic + + 0 + + 140 + + 280 + + 420 + + 560 + + 700 + MiB/s + GoogleMessage1 + + 654 + + 145 + + 153 + diff --git a/benchmarks/charts/reflect-read-log_record.svg b/benchmarks/charts/reflect-read-log_record.svg new file mode 100644 index 0000000..bc83983 --- /dev/null +++ b/benchmarks/charts/reflect-read-log_record.svg @@ -0,0 +1,39 @@ + + + + Reflection Read Throughput (decode + scan all fields) — LogRecord + + vtable + + bridge + + dynamic + + 0 + + 400 + + 800 + + 1,200 + + 1,600 + + 2,000 + MiB/s + LogRecord + + 1,461 + + 190 + + 355 + diff --git a/docs/investigations/reflection-vtable.md b/docs/investigations/reflection-vtable.md index 73e1c0d..4abad41 100644 --- a/docs/investigations/reflection-vtable.md +++ b/docs/investigations/reflection-vtable.md @@ -399,12 +399,6 @@ below). The remaining open items are noted under "Not yet done" at the end. Value` and dropping the bespoke `impl ReflectList for Vec`. owned `unknown_fields()` is overridden to preserve unknowns (bridge parity). -### Not yet done - -- **Benchmark numbers in the README.** The `reflect` bench gained vtable cases - (6–10× over bridge, ~4× over pure `DynamicMessage`); regenerating the README - charts through the pinned Xeon harness is outstanding. - ### Done since - ✅ **`VTable` is the default reflection mode.** `generate_reflection(true)` (and @@ -412,6 +406,12 @@ below). The remaining open items are noted under "Not yet done" at the end. via `reflect_mode(ReflectMode::Bridge)`. Vtable no longer requires views — the owned `impl ReflectMessage` is self-contained, so views-off builds get owned-only vtable reflection rather than an error. +- ✅ **README reflection charts regenerated.** The `reflect` bench's new cases are + charted: a `view` series on reflection-decode (zero-copy decode is the floor, + +25% to +153% over the generated owned codec) and a new reflection-read chart + (`vtable` vs `bridge` vs `dynamic` — vtable runs 4–7× faster than the bridge + round-trip). Regenerated through the Docker bench harness; see the README note + that these reflection charts run on the dev host, not the pinned Xeon runner. ## What this does *not* solve