feat: insertBatch — sorted bulk insertion with single-pass leaf merge and multi-leaf overflow by Copilot · Pull Request #11 · NatLabs/memory-collection

Copilot · 2026-03-09T23:45:52Z

Calling insert N times is O(N · m) in leaf shifts and O(N log N) in tree traversals. For batches of ~100 entries, especially when several land in the same leaf, this is wasteful. This PR adds insertBatch that sorts the batch once, groups entries by target leaf, and processes each group in a single O(m+n) merge.

New API

// Base module
public func insertBatch<K, V>(btree, btree_utils, entries : [(K, V)]) : [?V]

// MemoryBTree class
btree.insertBatch(entries) : [?V]

Returns previous values in input order (null = key was new). For duplicate keys within the batch, the last occurrence wins.

Algorithm

Sort + group: entries are sorted by key blob (O(n log n)); get_right_separator_of_leaf locates each leaf boundary by walking the parent chain once per leaf transition instead of re-traversing from the root.
Single-pass merge per leaf: two-pointer merge of existing leaf entries and the sorted batch slice — one bulk write replaces N individual shift operations.
Multi-leaf overflow: when the merged count exceeds node_capacity, all required leaves (⌈merged/capacity⌉) are allocated, filled, linked into the doubly-linked leaf chain, and propagated to parent branches in one bottom-up loop — no cascading single-split side effects.
Subtree size correctness: after all structural changes, recompute_subtree_sizes_on_path walks every affected leaf's path to the root and recomputes each branch's subtree_size as the sum of its immediate children. This handles branch splits that redistribute leaves into different subtrees without double-counting replacements.

Refactoring

The split-propagation loop inside insert (~40 lines) is extracted into propagate_node_split, now shared by both insert and insertBatch.

New private helpers

Helper	Purpose
`propagate_node_split`	Propagates a leaf/branch split up the parent chain; creates new root if needed
`get_right_separator_of_leaf`	Returns the branch key that upper-bounds a leaf (walks parent chain)
`recompute_subtree_sizes_on_path`	Recomputes subtree sizes bottom-up from a leaf to the root
`bulk_insert_into_leaf`	Core merge + overflow logic for a single leaf group

Tests

7 new test cases run under all existing node-capacity / merge-threshold / tail-compression configurations:

empty batch, all-new keys, all-replacements, mixed new+replacement
leaf overflow triggering multi-leaf creation
batch-internal duplicate keys
reference comparison: results match sequential insert output
large 500-entry batch spanning many leaves

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

icp-api.io
- Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node /home/REDACTED/work/_temp/ghcca-node/node/bin/mops install (dns block)
- Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node node /home/REDACTED/work/_temp/ghcca-node/node/bin/mops test (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Original prompt

looking at the base code can we implement something like batch inserts where for about a 100 entries we sort sort them and insert them into the trees, caching the branch keys we've visited along the way so we don't have to read them from stable memory. and inserting them inserting them in sorted order such that if multiple entries belong in a single leaf we insert them once only needing one shift. In some cases these might require creating multiple leaves from the bottom up instead of a single leaf split which could cause cascading effects upwards. as we do this we also have to carefully update the subtree size keeping track of entries that already exist and are replacements. in order to not increment the subtree size for those entries.
Create a plan. map out edge cases, testing strategies and see if it would improve the performance

Created from VS Code.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…servative and balance - Balanced - check if the node's items are below the threshold and if there are any neighbors that have enough space to merge - Conservative - checks if the current and neighboring nodes items are both below the merge threshold, otherwise no merge occurs

…tail compression

… instead of referencing the key stored in the leaf

…eshold - upgrade memory btree stat to v1_0_0 as the new version is no longer compatible with the previous state as branches now store their keys in independent memory blocks instead of referencing the keys stored in the leaf nodes

… after a certain width

…provements. - Fix edge case in the remove() fn causing a double free memory block error of the last separator key in a branch when index = - and size <= 2 - Update the clear() method to retain the header of each memory-region used by calling the new library's function: clearAndRetainHeader()

Co-authored-by: tomijaga <50385769+tomijaga@users.noreply.github.com>

…inal index Co-authored-by: tomijaga <50385769+tomijaga@users.noreply.github.com>

tomijaga · 2026-03-10T01:09:57Z

@copilot why does it say 30 files were changed?

Copilot · 2026-03-10T01:11:51Z

@copilot why does it say 30 files were changed?

The 30 changed files are not from my commits. My two commits (4317cd9 and 9bab59c) only modified 3 files:

src/MemoryBTree/Base.mo
src/MemoryBTree/lib.mo
tests/MemoryBTree/MemoryBTree.Test.mo

The remaining ~27 files were already modified in the 11 pre-existing commits that were on this branch before my work started (commits b539592 through 00a9666 — dev-branch changes like tail compression, branch key memory-block refactoring, MemoryBuffer/MemoryQueue updates, etc.). The high file count in the PR is because those pre-existing commits are all included in the PR diff relative to main.

tomijaga added 11 commits October 10, 2025 16:25

Update dependencies

b539592

Merge commit '494bf0a' into dev

bdcdc9c

support splitting at the position that has the smallest bytes during …

83bcf12

…tail compression

tail compression and store the branch keys in their own memory blocks…

c8ea602

… instead of referencing the key stored in the leaf

revert from moc 0.16.0 to 0.14.13 as future versions cut off the logs…

a312dd5

… after a certain width

update memory-region from 1.3.0 to 1.4.0

17f274d

Deprecate #GenCmp

8848529

Checkpoint from VS Code for cloud agent session

00a9666

Copilot AI assigned Copilot and tomijaga Mar 9, 2026

Copilot started work on behalf of tomijaga March 9, 2026 23:45 View session

Copilot AI and others added 2 commits March 10, 2026 00:17

feat: add insertBatch for efficient bulk insertion into MemoryBTree

4317cd9

Co-authored-by: tomijaga <50385769+tomijaga@users.noreply.github.com>

fix: stabilise sort for batch duplicate keys by breaking ties on orig…

9bab59c

…inal index Co-authored-by: tomijaga <50385769+tomijaga@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Implement batch inserts with sorted order and subtree size management~~ feat: insertBatch — sorted bulk insertion with single-pass leaf merge and multi-leaf overflow Mar 10, 2026

Copilot finished work on behalf of tomijaga March 10, 2026 00:22

Copilot started work on behalf of tomijaga March 10, 2026 01:10 View session

Copilot finished work on behalf of tomijaga March 10, 2026 01:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: insertBatch — sorted bulk insertion with single-pass leaf merge and multi-leaf overflow#11

feat: insertBatch — sorted bulk insertion with single-pass leaf merge and multi-leaf overflow#11
Copilot wants to merge 13 commits intomainfrom
copilot/vscode-mmjtxf5f-shsn

Copilot AI commented Mar 9, 2026 •

edited

Loading

Uh oh!

tomijaga commented Mar 10, 2026

Uh oh!

Copilot AI commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New API

Algorithm

Refactoring

New private helpers

Tests

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

tomijaga commented Mar 10, 2026

Uh oh!

Copilot AI commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 9, 2026 •

edited

Loading