perf(rust): cache parsed VNode subtrees for unchanged loop items (#1970, flag-gated)#1973
Conversation
…, flag-gated) Extends the per-item loop RENDER cache (#1967/#1969) to ALSO cache the PARSED VNode subtree, so a reorder of unchanged-content keyed list items skips BOTH render AND html5ever-parse — the parse phase is ~60% of render_with_diff, the bigger half the render cache could not reach. Design (placeholder + foster-safe gate + post-parse splice with full-reparse fallback): - LoopRenderCache gains a SECOND map (content-hash u64 -> parsed Vec<VNode>) keyed by the SAME hash as the render fragments, plus a per-render item manifest. Same loop_render_cache_enabled flag, same two cacheability gates. - The For arm, for a parse-cache HIT on a foster-SAFE item (item root not a table/select-family tag), emits a tiny <dj-pc h=..> placeholder instead of the item HTML; render_with_diff/render_binary_diff parse the REDUCED html (cheap), splice the cached subtrees back in, then re-assign ALL dj-ids by a pre-order re-walk so the assembled tree is byte-identical to a full parse. - dj-id strategy: dj-ids are purely positional (parser assigns pre-order), so a cached subtree's baked ids are position-WRONG on reuse (proven: naive reuse -> [0,1,2,3,4,1,2] duplicates). The re-walk from the same counter base the full parse uses reproduces fresh-parse ids exactly (initial + continuing). - Foster-unsafe containers (table/select), multi-root items, and any splice anomaly (cache miss / count mismatch / residual placeholder) fall back to a full parse — always correct, no parse win for that render. - VNode.attrs now serialize in SORTED key order so the patch wire format is deterministic (a HashMap serializes in nondeterministic bucket order, which the parse-cache path would otherwise expose as an ON-vs-OFF patch diff). Correctness (byte-identity cache ON == OFF, html + patches + version) proven across plain/keyed/dj-if/cycle/nested/tuple/div/table/select/multi-root templates and initial/reorder/change/append/remove ops, for BOTH render_with_diff and render_binary_diff. The dj-key reorder round-trip (post-diff dj-ids/dj-keys match cache-off exactly) is the load-bearing case. Gate-off (#1468) confirmed: neutering the re-walk fails 6 byte-identity tests. Parse-count probe (TestParseCountProbe1970): a reorder of N unchanged keyed items -> N parse hits, 0 re-parses; an append re-parses only the new item. Per-phase bench (median over 60 distinct shuffles, render_with_diff): N=50: parse 0.145->0.113ms (-21.9%), total 0.430->0.339ms (-21.3%) N=500: parse 1.394->1.159ms (-16.8%), total 4.037->3.399ms (-15.8%) beating #1969's render-only ~6-11% end-to-end win. New tests: crates/djust_vdom/tests/test_loop_parse_cache_1970.rs (splice + dj-id re-walk + gate-off, 5 cases); the parse_cache_1970 module in crates/djust_templates/tests/test_loop_render_cache_1967.rs (foster-safe gate + placeholder + manifest lifecycle, 8 cases); TestParseCacheByteIdentity1970 / TestParseCountProbe1970 in python/djust/tests/test_loop_render_cache_1967.py (end-to-end byte-identity + parse-count probe + gate-off, 16 cases). Closes #1970. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01D4VsbRKefaPjn6Rxs9ig1A
…1970) - CHANGELOG [Unreleased] ### Performance: the #1970 parsed-subtree cache (design, dj-id re-walk strategy, foster-safe gate, sorted-attrs wire determinism, per-phase bench, byte-identity + parse-count + gate-off proof). - ROADMAP: v1.0.8-3 milestone block (the parse-phase half of #1967/#1969). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01D4VsbRKefaPjn6Rxs9ig1A
Security Review RequiredThis PR modifies security-sensitive files. Please ensure the following areas receive additional review:
Review Checklist
See SECURITY_GUIDELINES.md for full security review criteria. |
Stage 11 — Adversarial Code Review: REQUEST_CHANGES 🔴Reviewed in an isolated worktree against PR head The mechanism is well-engineered and the gate-off proof is genuine, but I found one 🔴 byte-identity violation that breaks the PR's central claim (cache ON == OFF) and falsifies the "any anomaly falls back to a full parse — always correct" guarantee. 🔴 MUST-FIX: a user
|
…e by |safe content (#1970) Adversarial-review 🔴: a loop item rendering a LITERAL unescaped `<dj-pc ...>` element via `|safe`/`mark_safe`, alongside a sibling that emitted a real placeholder that same render, broke byte-identity — cache-ON stripped the user's `<dj-pc>` (and foster-relocated following content) while cache-OFF preserved it. Root cause: `reconstruct_full_loop_html` matched the bare `<dj-pc ...>` substring, so user content was indistinguishable from the framework sentinel; when manifest entries were exhausted it DROPPED the extra `<dj-pc>`, and the full-parse fallback then parsed an already-corrupted string. Aggravator: a fixed-key `DefaultHasher` made the `h=` hash predictable, so crafted `<dj-pc h="<known-hash>">` content could splice a DIFFERENT cached item's subtree into that position (content-confusion). Fix (structural): the placeholder sentinel tag now carries a per-render random nonce — `dj-pc-<nonce_hex>` instead of bare `dj-pc`. Generated fresh every render in `LoopRenderCache::begin_render` (RandomState-seeded, per-render, unpredictable across requests). The For arm emits `<dj-pc-<nonce> h=..>`; reconstruct + splice match ONLY the current render's nonce tag, so a literal `<dj-pc>` in user content (no nonce) is never matched -> preserved verbatim, no hijack. Defense-in-depth: parse-cache eligibility additionally refuses any item whose rendered HTML contains the literal sentinel prefix (`item_html_is_foster_safe` -> `contains_sentinel_prefix`), and the splice's residual-sentinel guard now checks the `dj-pc` PREFIX (`tree_contains_tag_prefix`) so a stray sentinel of any nonce forces the full-parse fallback. html5ever keeps `dj-pc-<hex>` intact (valid custom-element name; hex nonce already lowercase) — verified. Reproduce-first: the exact review reproducer is RED before this commit (user's `<dj-pc>` stripped) and GREEN after (byte-identical). Gate-off (#1468): neutering the nonce to the bare prefix makes the 3 sentinel-collision tests RED — the nonce is load-bearing. The crafted-`h=` hijack test confirms no content-confusion. The reorder parse-cache win is unchanged (REORDER still N parse hits / 0 re-parses); the full byte-identity battery, parse-count probe, and the Rust (863+37) + Python (8664) suites stay green. New tests: `TestParseCacheSentinelCollision1970` (|safe literal + crafted-h= hijack, render_with_diff + render_binary_diff) in `python/djust/tests/test_loop_render_cache_1967.py`; `item_with_literal_sentinel_is_ineligible` + the nonce-tag placeholder assertion in `crates/djust_templates/tests/test_loop_render_cache_1967.rs`; `literal_unnonced_dj_pc_is_not_spliced` + `gate_off_bare_prefix_would_match_user_content` in `crates/djust_vdom/tests/test_loop_parse_cache_1970.rs`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01D4VsbRKefaPjn6Rxs9ig1A
Document the adversarial-review 🔴 fix in the #1970 Performance entry: the placeholder sentinel now carries a per-render random nonce (`dj-pc-<nonce>`) so a `|safe`/`mark_safe` item rendering a literal `<dj-pc>` can't be mistaken for a placeholder or hijack a cached subtree via a crafted `h=`. Adds the new sentinel-collision test classes to the test list + the bare-prefix gate-off. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01D4VsbRKefaPjn6Rxs9ig1A
Stage 12 — Re-Review: APPROVE ✅Re-review of the FIX (commits ✅ The original 🔴 is CLOSEDRan the EXACT prior reproducer against the rebuilt extension (a The user's literal ✅ Nonce gate-off (#1468) — the nonce is LOAD-BEARING, tests are NOT tautologicalNeutered
Notable nuance the gate-off surfaced: the ✅ Probed for NEW holes the nonce could open — none found
🟡 Minor (perf-only, not blocking): ✅ Prior PASS list re-verified after the fix touched the hot path
✅ Suites + lint
Worktree-isolated throughout; restored the shared Verdict: APPROVE ✅ — the 🔴 is empirically closed by a load-bearing per-render nonce; no new hole opened. |
Retrospective — PR #1973 (pipeline-drain)Task: v1.0.8-3 — drained #1970 (parse-phase loop render cache, the bigger half of the #1967/#1969 lever). Caches parsed VNode subtrees per item (content-hash → Quality: 4/5Correct, well-tested, flag-gated, real end-to-end win (~16–21% reorder What went well
What didn't (lessons)
Verified
|
… ROADMAP bucket (#1974) main now tracks the 1.1 development line: 1.0.8 shipped (2026-06-23, tag v1.0.8) and 1.1.0 is the active release (rc1/rc2/rc3 cut on the `1.1` branch). main's version files lagged at 1.0.8, which led the #1970 drain to mislabel its ROADMAP milestone as "v1.0.8-3 / ships in 1.0.8". This: - Bumps the version 1.0.8 → 1.1.0 across pyproject.toml, Cargo.toml, both __init__.py files, uv.lock, Cargo.lock (via `make version VERSION=1.1.0`). - Renames the drain milestone v1.0.8-3 → v1.1.0-1 (ships in 1.1.0) and marks #1970 ✅ PR #1973. The #1967/#1969/#1970 work is under [Unreleased] and lands in 1.1.0 via the main→`1.1` resync. 1.0 is maintenance-only going forward (bug/security fixes); 1.1 develops on main. Co-authored-by: Claude <noreply@anthropic.com>
Brings the post-1.0.8 perf arc onto the v1.1 release line: the keyed loop-item render cache (#1967/PR #1969), the parsed-VNode-subtree cache (#1970/PR #1973), and the cold-start filter-bridge warm (#1968) — all flag-gated default-OFF, under LIVEVIEW_CONFIG['loop_render_cache_enabled']. Also brings the corrected v1.1.0-1 ROADMAP drain bucket and main's version bump to the 1.1 line. Version files kept at 1.1.0rc3 (the release branch owns its version during rc; main's 1.1.0 bump is overridden here per the resync convention). Validated on the merged tree: cargo check + full Rust suite green (workspace excl djust_live + djust_live --no-default-features, 37 passed), full Python suite 8712 passed / 0 failed across tests/ python/tests/ python/djust/tests/. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cut rc4 from main (the 1.1 dev+release line going forward; 1.0.x patches live on the `1.0` branch). main's CHANGELOG adopts the release-history rc sections (imported verbatim from the v1.1.0rc3 tag) so future cuts come from main directly. The new [1.1.0rc4] section is the delta since rc3: - #1967 keyed loop-item render cache (PR #1969) - #1970 parsed VNode subtree cache (PR #1973) - #1968 cold-start filter-bridge warm All flag-gated default-OFF (LIVEVIEW_CONFIG['loop_render_cache_enabled']). Coverage verified: every [Unreleased] entry is captured in rc1/rc2/rc3 or the rc4 delta (no CHANGELOG history lost). Co-authored-by: Claude <noreply@anthropic.com>
Closes #1970.
Extends the per-item loop render cache (#1967/PR #1969) to also cache the parsed VNode subtree, so a reorder of unchanged-content keyed list items skips both render AND html5ever-parse. The render cache cut the render phase, but parse + VDOM-build are ~60% of
render_with_diff— the bigger half #1969 couldn't reach (its render-only end-to-end win was Amdahl-bounded to ~6-11%).Design as built
A second cache + a placeholder-splice with a full-reparse fallback:
LoopRenderCache(crates/djust_templates/src/loop_cache.rs) gains a second map (content-hashu64→ parsedVec<VNode>) keyed by the same content-hash as the render fragments, plus a per-render item manifest. SameLIVEVIEW_CONFIG['loop_render_cache_enabled']flag, same two cacheability gates (position-independent + item-only body). When off, byte-identical to before.Node::Forarm, for a parse-cache HIT on a foster-parenting-safe item, emits a tiny<dj-pc h=...>placeholder instead of the item HTML — so the assembled string html5ever parses is a short reduced form.render_with_diff/render_binary_diffparse the reduced HTML, splice the cached subtrees back into the placeholders (djust_vdom::splice_loop_placeholders), then re-assign every dj-id.The dj-id hazard + the strategy (the crux)
dj-ids are assigned purely positionally, pre-order by the parser. A cached subtree carries ids baked at its original parse position, so reusing it verbatim elsewhere duplicates ids — empirically
[0,1,2,3,4,1,2]for a 2-of-3 identical-content list vs fresh[0,1,2,3,4,5,6]. Therefore the splice re-walks the assembled tree pre-order, re-assigning ids from the same counter base the full parse would use (0for initialparse_html;max(old_ids)+1for continuingparse_html_continue, after the #1550/#1552ensure_id_counter_at_leastbump). This reproduces a fresh full-parse's ids byte-for-byte (proven for initial[0..6]and continuing[7..d]), so the assembled VDOM, every patch (Insert/Replace embed the new node), andlast_vdomare identical to the cache-OFF path.Why a placeholder (not container-location)
The renderer emits a flat HTML string with no enclosing-element context, so it cannot tell djust_live where loop items sit in the tree. Emitting a placeholder lets html5ever place it exactly where the item was — no container-location logic. The catch: html5ever foster-parents non-table content out of
<table>/<select>(a<dj-pc>there destroys structure — empirically confirmed). So the For arm gates on the item's own rendered root tag: foster-unsafe roots (tr/td/th/tbody/thead/tfoot/caption/colgroup/col/option/optgroup) skip the parse cache. Foster-unsafe containers, multi-root items, and any splice anomaly (placeholder cache miss / found-count mismatch / a residual<dj-pc>) fall back to a full parse — always correct, just no parse win that render.Wire determinism
VNode.attrsnow serialize in sorted key order (serialize_attrs_sorted). A plainHashMapserializes in nondeterministic bucket order; the parse-cache path (which assembles a node via a different parse than the cache-OFF full parse) would otherwise surface that as an ON-vs-OFF patch-JSON diff. Sorting matchesto_html's already-sorted attribute output and makes the patch wire format deterministic (in the spirit of the #1448 wire-protocol pinning).Both
render_with_diffandrender_binary_diffare wired symmetrically (parallel-path-drift cure, #1646).Correctness proof
plain/keyed/dj-if/cycle/nested/tuple/div/table/select/multi-roottemplates ×initial/reorder/change/append/remove, for bothrender_with_diffandrender_binary_diff. The dj-key reorder round-trip (post-diff dj-ids/dj-keys match cache-off exactly) is the load-bearing case (TestParseCacheByteIdentity1970).TestParseCountProbe1970, vialoop_parse_cache_hits()/loop_parse_cache_misses()): a reorder of N unchanged keyed items → N parse hits, 0 re-parses; an append re-parses only the new item.crates/djust_vdom/tests/test_loop_parse_cache_1970.rs(splice + re-walk + gate-off, 5), theparse_cache_1970module incrates/djust_templates/tests/test_loop_render_cache_1967.rs(foster-safe gate + placeholder + manifest, 8),TestParseCacheByteIdentity1970/TestParseCountProbe1970inpython/djust/tests/test_loop_render_cache_1967.py(16).Per-phase bench (median over 60 distinct shuffles,
render_with_diff, ON vs OFF)Beats #1969's render-only ~6-11% by also cutting the parse phase (the render phase drops too — the #1969 cache compounds). Richer item bodies widen the win.
Suite results
djust_live37 passed / 0 failed.cargo fmt --checkclean;cargo clippyclean on the three crates.🤖 Generated with Claude Code
https://claude.ai/code/session_01D4VsbRKefaPjn6Rxs9ig1A