Optimize hierarchy building and metadata folding performance #161
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
add_hierarchy()function with memoized path tracing instead of iterative while loopfold_in_metadata_for_columns()with pre-split coordinate matrixPerformance Benchmarks
add_hierarchy()- Hierarchy BuildingThe optimization replaces an iterative while loop (up to 100 iterations) with single-pass memoized path tracing:
strsplit()on all rows to find current top,paste0()to prepend parent, string comparison to check changesfold_in_metadata_for_columns()- Coordinate ParsingThe matrix-based approach has similar overhead to lapply for typical table sizes.
Correctness Verification
Verified exact output equivalence by installing both package versions and comparing results:
Table 20-10-0085 (169,108 rows, 33 columns, 6 hierarchy columns):
Table 20-10-0056 (73,530 rows, 27 columns, 4 hierarchy columns):
Test plan
devtools::test()- 20 tests)devtools::check()passes with 0 errors, 0 warnings🤖 Generated with Claude Code