Fix O(n²) find_head and stack overflow in filter_block_tree#9090
Fix O(n²) find_head and stack overflow in filter_block_tree#9090dapplion wants to merge 3 commits intosigp:unstablefrom
Conversation
Add a build_children_index() helper that creates a parent→children mapping in O(n), then pass it through find_head_walk, filter_block_tree, and get_node_children to replace per-node full scans with O(1) lookups. Convert filter_block_tree from recursive to an iterative reverse pass. Proto_array nodes are stored in insertion order (children always have higher indices than parents), so a single reverse iteration processes every child before its parent — matching the spec's recursive post-order semantics without recursion or an explicit stack. This avoids stack overflow on long chains without finality (500k+ blocks). Adds criterion benchmarks for find_head with chain lengths up to 518k.
c33e8c8 to
68ec147
Compare
|
what do you think about caching the canonical children indices as a field on |
Persist children: Vec<Vec<usize>> as a field on ProtoArray instead of rebuilding it on every find_head call. The index is maintained incrementally in on_block (append) and maybe_prune (shift + drop). This enables O(1) canonical block root payload status lookups without requiring a full tree walk each time.
|
@eserilev why do we need to cache children to check canonical payload status? can't we just check |
ah right, didnt think of that |
| payload_status: PayloadStatus::Pending, | ||
| }; | ||
|
|
||
| let children_index = &self.children; |
There was a problem hiding this comment.
Might be a little neater and clearer if we just read self.children in the functions that need access? There's no use-case where we plumb in different children is there?
| ) { | ||
| for node_index in (start_index..self.nodes.len()).rev() { | ||
| let Some(node) = self.nodes.get(node_index) else { | ||
| continue; |
There was a problem hiding this comment.
This should never happen, I wonder if we should make this func return an error so we can avoid these silent failures? (could break on a refactor)
| let children = children_index | ||
| .get(node_index) | ||
| .map(|c| c.as_slice()) | ||
| .unwrap_or(&[]); |
There was a problem hiding this comment.
This should also be unreachable, right? Could be an error.
| .collect() | ||
| }) | ||
| .collect()) | ||
| .unwrap_or_default()) |
There was a problem hiding this comment.
Another Claude-ism, should just be an error instead of a silent failure
| // Discard children that pointed into the pruned prefix (underflow). | ||
| *child_index < self.nodes.len() |
There was a problem hiding this comment.
I don't think this makes sense/is necessary? I would expect that doing self.children = self.children.split_off(finalized_index); already removes all of the nodes in the pruned prefix, and it's impossible for nodes after the finalized_index to nominate nodes prior to finalized_index as children.
I think there's an invariant that all child indices are > their parent's index, right? And here we know that the parent index is >= finalized_index, so all of the child indices are also >= finalized_index.
Furthermore, even if we were trying to check that the children are in bounds, I don't think a sat sub and checking < self.nodes.len() does anything useful
michaelsproul
left a comment
There was a problem hiding this comment.
Looks good on the whole, just had some questions about style
There are a couple of failing tests too that I haven't yet looked into
Follow-up from #9025 which removed best_child/best_descendant caching for spec clarity, introducing O(n²) patterns in find_head. Both filter_block_tree and get_node_children were scanning all nodes to find children at each step.
Adds a build_children_index() that creates a parent→children mapping in O(n) once per find_head call, then passes it through so child lookups are O(1).
Also converts filter_block_tree from recursive to an iterative reverse pass — proto_array nodes are stored in insertion order so a reverse iteration naturally processes children before parents, matching the spec's post-order semantics. The previous recursive version would stack overflow around ~30k blocks, which isn't enough to survive extended non-finality.
Includes criterion benchmarks for find_head with chain lengths up to 518k blocks.