feat(rocksdb): migrate SQLite indexing to RocksDB#64
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Migrates DFTracer indexing/provenance storage from SQLite sidecars to a RocksDB-backed .dftindex store, updating APIs, utilities, tests, build tooling, and docs to match the new storage model.
Changes:
- Replaced
.idx/.pidxsidecar concepts with root-local.dftindexstore semantics across C/C++/Python APIs and tests. - Introduced RocksDB core utilities (async helpers, key codec, DB manager) and updated pipeline/executor “db pool” plumbing.
- Hardened I/O + runtime behavior (ScopedFd RAII, iterator/status handling helpers, rpath/test runner adjustments, docs/CI updates).
Reviewed changes
Copilot reviewed 250 out of 281 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/reader/test_reader_stream.cpp | Updates reader API usage to get_index_path() |
| tests/reader/test_reader_formats.cpp | Adjusts indexer expectations for RocksDB store lifecycle (exists/need_rebuild) |
| tests/reader/test_reader.cpp | Updates indexer/reader getters to get_index_path() |
| tests/reader/test_reader.c | Renames test comments for index_path terminology |
| tests/reader/test_basic_factory.cpp | Aligns factory tests with .dftindex root and determine_index_path() |
| tests/python/test_trace_reader.py | Updates Python tests to expect .dftindex store behavior |
| tests/python/test_reorganization_planner.py | Switches index path inputs from .idx to environment index path |
| tests/binaries/test_dftracer_tar.cpp | Updates tar binary tests to check .dftindex directory existence |
| tests/binaries/test_dftracer_server.cpp | Improves environment skip check by binding to loopback |
| tests/binaries/test_dftracer_organize.cpp | Updates organize tests to verify .dftindex directories exist |
| tests/binaries/test_dftracer_info.cpp | Ensures binary tests set runtime library path for RocksDB deps |
| tests/CMakeLists.txt | Adds rpath helper for tests to locate shared libs at runtime |
| src/dftracer/utils/utilities/replay/replay.cpp | Renames idx_path to index_path for reader creation |
| src/dftracer/utils/utilities/reader/trace_reader.cpp | Switches probing/usage to .dftindex path and index_path_ |
| src/dftracer/utils/utilities/reader/internal/tar_reader.h | Renames member/API from idx_path to index_path |
| src/dftracer/utils/utilities/reader/internal/reader_factory.cpp | Renames factory arg to index_path and forwards to readers |
| src/dftracer/utils/utilities/reader/internal/reader_c.cpp | Renames C API parameter and error message to index_path |
| src/dftracer/utils/utilities/reader/internal/gzip_reader.h | Renames member/API from idx_path to index_path |
| src/dftracer/utils/utilities/reader/internal/gzip_reader.cpp | Updates implementation/logging to use index_path |
| src/dftracer/utils/utilities/indexer/visitors/manifest_visitor.cpp | Routes manifest persistence through IndexDatabase (no raw SQLite) |
| src/dftracer/utils/utilities/indexer/internal/transaction_scope.h | Adds RAII transaction helper for rollback-on-failure |
| src/dftracer/utils/utilities/indexer/internal/tar/tar_indexer.h | Migrates TAR indexer internals off SQLite; hardens caching with optionals/mutex |
| src/dftracer/utils/utilities/indexer/internal/tar/queries/query_archive_id.cpp | Removes SQLite TAR query implementation |
| src/dftracer/utils/utilities/indexer/internal/tar/queries/insert_tar_file_record.cpp | Removes SQLite TAR insert implementation |
| src/dftracer/utils/utilities/indexer/internal/tar/queries/insert_tar_checkpoint_record.cpp | Removes SQLite TAR checkpoint insert implementation |
| src/dftracer/utils/utilities/indexer/internal/tar/queries/insert_file_record.cpp | Removes SQLite file record insert implementation |
| src/dftracer/utils/utilities/indexer/internal/tar/queries/insert_archive_record.cpp | Removes SQLite archive insert implementation |
| src/dftracer/utils/utilities/indexer/internal/tar/queries/insert_archive_metadata_record.cpp | Removes SQLite archive metadata insert implementation |
| src/dftracer/utils/utilities/indexer/internal/sqlite/statement.h | Removes deprecated forwarding header |
| src/dftracer/utils/utilities/indexer/internal/sqlite/database.h | Removes deprecated forwarding header |
| src/dftracer/utils/utilities/indexer/internal/indexer_factory.cpp | Generates .dftindex roots (via determine_index_path) |
| src/dftracer/utils/utilities/indexer/internal/indexer_c.cpp | Renames C API arg to index_path |
| src/dftracer/utils/utilities/indexer/internal/helpers.h | Adds normalize_index_root, renames validity check to directory-based |
| src/dftracer/utils/utilities/indexer/internal/helpers.cpp | Implements .dftindex normalization + directory validity check |
| src/dftracer/utils/utilities/indexer/internal/gzip/queries/query_stored_file_info.cpp | Removes SQLite gzip query implementation |
| src/dftracer/utils/utilities/indexer/internal/gzip/queries/query_schema_validity.cpp | Removes SQLite gzip schema check |
| src/dftracer/utils/utilities/indexer/internal/gzip/queries/query_num_lines.cpp | Removes SQLite gzip query implementation |
| src/dftracer/utils/utilities/indexer/internal/gzip/queries/query_max_bytes.cpp | Removes SQLite gzip query implementation |
| src/dftracer/utils/utilities/indexer/internal/gzip/queries/query_file_id.cpp | Removes SQLite gzip query implementation |
| src/dftracer/utils/utilities/indexer/internal/gzip/queries/query_checkpoint_size.cpp | Removes SQLite gzip query implementation |
| src/dftracer/utils/utilities/indexer/internal/gzip/queries/query_checkpoint.cpp | Removes SQLite gzip query implementation |
| src/dftracer/utils/utilities/indexer/internal/gzip/queries/queries.h | Removes SQLite gzip query header |
| src/dftracer/utils/utilities/indexer/internal/gzip/queries/insert_file_record.cpp | Removes SQLite gzip insert implementation |
| src/dftracer/utils/utilities/indexer/internal/gzip/queries/insert_file_metadata_record.cpp | Removes SQLite gzip insert implementation |
| src/dftracer/utils/utilities/indexer/internal/gzip/queries/insert_checkpoint_record.cpp | Removes SQLite gzip insert implementation |
| src/dftracer/utils/utilities/indexer/internal/gzip/queries/delete_file_record.cpp | Removes SQLite gzip delete implementation |
| src/dftracer/utils/utilities/indexer/internal/gzip/gzip_indexer.h | Migrates gzip indexer off SQLite; adds explicit cache readiness flags |
| src/dftracer/utils/utilities/indexer/internal/checkpoint_size.h | Reorders parameters in signature for checkpoint sizing |
| src/dftracer/utils/utilities/composites/file_merger_utility.cpp | Renames effective index var; uses index_path for readers/line inputs |
| src/dftracer/utils/utilities/composites/dft/views/view_reader_utility.cpp | Renames fluent builder with_idx_path → with_index_path |
| src/dftracer/utils/utilities/composites/dft/views/view_builder_utility.cpp | Updates pruner + statistics queries to use IndexDatabase methods |
| src/dftracer/utils/utilities/composites/dft/statistics/trace_statistics.cpp | Renames JSON field idx_path → index_path |
| src/dftracer/utils/utilities/composites/dft/statistics/statistics_aggregator_utility.cpp | Switches async runner from sqlite to rocksdb; normalizes index roots |
| src/dftracer/utils/utilities/composites/dft/statistics/chunk_detail_scanner_utility.cpp | Uses index_path when creating indexed read inputs |
| src/dftracer/utils/utilities/composites/dft/reorganize/reconstruction_planner.cpp | Uses RocksDB provenance DB read-only mode + fid-scoped queries |
| src/dftracer/utils/utilities/composites/dft/reorganize/event_router.cpp | Awaits async provenance flush; avoids capturing ref in spawned tasks |
| src/dftracer/utils/utilities/composites/dft/internal/utils.cpp | Changes determine_index_path() to return root-local .dftindex dir |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/query_time_bounds.cpp | Removes SQLite query implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/query_resolved_by_hash.cpp | Removes SQLite query implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/query_metadata_lines.cpp | Removes SQLite query implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/query_index_dimensions.cpp | Removes SQLite query implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/query_hash_by_resolved.cpp | Removes SQLite query implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/query_file_bloom_filters_batch.cpp | Removes SQLite query implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/query_file_bloom_filter.cpp | Removes SQLite query implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/query_event_ranges.cpp | Removes SQLite query implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/query_chunk_bloom_filters_batch.cpp | Removes SQLite query implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/query_chunk_bloom_filters.cpp | Removes SQLite query implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/insert_metadata_lines.cpp | Removes SQLite insert implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/insert_index_dimension.cpp | Removes SQLite insert implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/insert_hash_resolution.cpp | Removes SQLite insert implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/insert_file_bloom_filter.cpp | Removes SQLite insert implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/insert_event_range.cpp | Removes SQLite insert implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/insert_chunk_bloom_filter.cpp | Removes SQLite insert implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/delete_metadata_lines.cpp | Removes SQLite delete implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/delete_hash_resolutions.cpp | Removes SQLite delete implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/delete_file_bloom_filter.cpp | Removes SQLite delete implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/delete_event_ranges.cpp | Removes SQLite delete implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/delete_chunk_statistics.cpp | Removes SQLite delete implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/delete_chunk_dimension_stats.cpp | Removes SQLite delete implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/queries/delete_chunk_bloom_filters.cpp | Removes SQLite delete implementation |
| src/dftracer/utils/utilities/composites/dft/indexing/chunk_statistics.cpp | Makes pid/tid formatting bounded/checkable with to_chars |
| src/dftracer/utils/utilities/composites/dft/indexing/chunk_indexer_utility.cpp | Uses index_path for indexed reader input |
| src/dftracer/utils/utilities/composites/dft/event_collector_utility.cpp | Uses index_path when creating readers |
| src/dftracer/utils/utilities/composites/dft/chunk_manifest_mapper_utility.cpp | Uses index_path in chunk spec mapping |
| src/dftracer/utils/utilities/composites/dft/chunk_extractor_utility.cpp | Uses index_path for indexed reader path |
| src/dftracer/utils/utilities/composites/dft/aggregators/chunk_mapper_utility.cpp | Renames fluent builder to with_index_path |
| src/dftracer/utils/utilities/composites/dft/aggregators/chunk_aggregator_utility.cpp | Updates index_dir derivation from index_path |
| src/dftracer/utils/utilities/call_tree/call_tree_mpi.cpp | Uses .dftindex root instead of .idx |
| src/dftracer/utils/utilities/call_tree/call_tree_internal.cpp | Updates comment + uses .dftindex root |
| src/dftracer/utils/server/viz_api.cpp | Updates index path references to .dftindex store |
| src/dftracer/utils/server/trace_api.cpp | Updates index path references + stats aggregator inputs |
| src/dftracer/utils/python/utilities/statistics_query.cpp | Uses determine_index_path() for index lookup |
| src/dftracer/utils/python/utilities/statistics_aggregator.cpp | Uses determine_index_path() for index lookup |
| src/dftracer/utils/python/utilities/reorganization_planner.cpp | Renames output field idx_path → index_path |
| src/dftracer/utils/python/utilities/reconstruction_planner.cpp | Updates docs to .dftindex terminology |
| src/dftracer/utils/python/utilities/metadata_collector.cpp | Updates to .dftindex terminology + index path computation |
| src/dftracer/utils/python/utilities/comparator.cpp | Uses .dftindex index paths in aggregation |
| src/dftracer/utils/python/utilities/aggregator.cpp | Updates docs to .dftindex terminology |
| src/dftracer/utils/python/trace_reader_iterator.h | Tracks background task future for safe iterator teardown |
| src/dftracer/utils/python/trace_reader_iterator.cpp | Waits for producer completion on dealloc to avoid use-after-free |
| src/dftracer/utils/python/indexer.h | Renames python binding member idx_path → index_path |
| src/dftracer/utils/core/sqlite/error.cpp | Removes SQLite error implementation |
| src/dftracer/utils/core/sqlite/database.cpp | Removes SQLite database implementation |
| src/dftracer/utils/core/sqlite/async.cpp | Removes SQLite async helpers (replaced by RocksDB async) |
| src/dftracer/utils/core/runtime.cpp | Clears moved-from tasks post-await to reduce resource retention |
| src/dftracer/utils/core/rocksdb/key_codec.cpp | Adds RocksDB key encoding/decoding helpers |
| src/dftracer/utils/core/rocksdb/async.cpp | Adds RocksDB async helpers using executor db pool |
| src/dftracer/utils/core/pipeline/pipeline.cpp | Renames sqlite pool config to db pool config |
| src/dftracer/utils/core/pipeline/executor.cpp | Renames sqlite pool to db pool; adjusts shutdown/drain behavior |
| src/dftracer/utils/core/io/thread_pool_backend.h | Adds pread callback submission API |
| src/dftracer/utils/core/io/thread_pool_backend.cpp | Implements callback-based pread and completion dispatch |
| src/dftracer/utils/core/io/kqueue_thread_pool_backend.h | Adds pread callback submission API |
| src/dftracer/utils/core/io/kqueue_thread_pool_backend.cpp | Implements callback-based pread and completion dispatch |
| src/dftracer/utils/core/io/io_uring_backend.h | Adds completion callback plumbing to io_uring backend |
| src/dftracer/utils/core/io/io_backend_sync.cpp | Updates sync I/O doc comment (no SQLite VFS assumption) |
| src/dftracer/utils/core/io/epoll_thread_pool_backend.h | Adds pread callback submission API |
| src/dftracer/utils/core/io/epoll_thread_pool_backend.cpp | Implements callback-based pread and completion dispatch |
| src/dftracer/utils/core/env.cpp | Adds environment utility and RocksDB open-files config |
| src/dftracer/utils/binaries/dftracer_tar.cpp | Updates CLI wording/logging for .dftindex store |
| src/dftracer/utils/binaries/dftracer_split.cpp | Awaits index builder and uses .dftindex root paths |
| src/dftracer/utils/binaries/dftracer_server.cpp | Updates index dir help/comment to .dftindex stores |
| src/dftracer/utils/binaries/dftracer_reconstruct.cpp | Uses .dftindex root for metadata/reader inputs |
| src/dftracer/utils/binaries/dftracer_organize.cpp | Renames “sidecar” step to index store; awaits build tasks |
| src/dftracer/utils/binaries/dftracer_index.cpp | Updates CLI docs to .dftindex terminology |
| src/dftracer/utils/binaries/dftracer_gen_fake_trace.cpp | Updates verify flow to use .dftindex root paths |
| src/dftracer/utils/binaries/dftracer_event_count.cpp | Updates index path variable naming/references |
| src/dftracer/utils/binaries/dftracer_comparator.cpp | Updates comment + .dftindex path usage |
| src/dftracer/utils/binaries/dftracer_aggregator.cpp | Awaits index builder and uses .dftindex root paths |
| setup.py | Minor formatting cleanup |
| include/dftracer/utils/utilities/reader/trace_reader.h | Updates docs and member name to index_path_ |
| include/dftracer/utils/utilities/reader/internal/reader_factory.h | Renames idx_path param to index_path |
| include/dftracer/utils/utilities/reader/internal/reader.h | Renames API get_idx_path() → get_index_path() + C API signature |
| include/dftracer/utils/utilities/indexer/internal/scan_prefix.h | Adds shared iterator prefix scan helper with status checking |
| include/dftracer/utils/utilities/indexer/internal/indexer_factory.h | Updates docs/signature for .dftindex path |
| include/dftracer/utils/utilities/indexer/internal/indexer.h | Renames index path getter + C API signature |
| include/dftracer/utils/utilities/indexer/index_builder_utility.h | Renames build result idx_path → index_path |
| include/dftracer/utils/utilities/fileio/types/chunk_spec.h | Renames chunk spec field to index_path |
| include/dftracer/utils/utilities/fileio/lines/sources/async_streaming_gz_line_generator.h | Uses ScopedFd for fd lifetime safety |
| include/dftracer/utils/utilities/fileio/lines/sources/async_plain_file_line_generator.h | Uses ScopedFd for fd lifetime safety |
| include/dftracer/utils/utilities/fileio/lines/sources/async_plain_file_bytes_generator.h | Uses ScopedFd for fd lifetime safety |
| include/dftracer/utils/utilities/fileio/lines/line_types.h | Renames idx_path → index_path in line read input |
| include/dftracer/utils/utilities/fileio/lines/line_bytes_range.h | Updates docs to .dftindex example |
| include/dftracer/utils/utilities/composites/types.h | Renames various composite inputs to index_path |
| include/dftracer/utils/utilities/composites/line_batch_processor_utility.h | Updates inputs and iterator binding to index_path |
| include/dftracer/utils/utilities/composites/file_merger_utility.h | Renames fluent builder parameter to match index_path |
| include/dftracer/utils/utilities/composites/dft/views/view_reader_utility.h | Renames view reader input and builder method |
| include/dftracer/utils/utilities/composites/dft/views/view_builder_utility.h | Renames view builder input and builder method |
| include/dftracer/utils/utilities/composites/dft/statistics/trace_statistics.h | Renames stats field to index_path |
| include/dftracer/utils/utilities/composites/dft/statistics/statistics_aggregator_utility.h | Renames stats input to index_path |
| include/dftracer/utils/utilities/composites/dft/statistics/statistics.h | Updates docs to .dftindex terminology |
| include/dftracer/utils/utilities/composites/dft/statistics/chunk_detail_scanner_utility.h | Renames scan input to index_path |
| include/dftracer/utils/utilities/composites/dft/reorganize/reorganization_planner.h | Renames source file info to index_path |
| include/dftracer/utils/utilities/composites/dft/reorganize/provenance_tracker.h | Makes flush async (CoroTask<void>) |
| include/dftracer/utils/utilities/composites/dft/metadata_collector_utility.h | Renames inputs/outputs to index_path and updates docs |
| include/dftracer/utils/utilities/composites/dft/internal/utils.h | Updates docs and signatures for .dftindex roots |
| include/dftracer/utils/utilities/composites/dft/internal/chunk_spec.h | Updates spec conversion to index_path |
| include/dftracer/utils/utilities/composites/dft/indexing/chunk_statistics.h | Updates storage docs (no SQLite assumption) |
| include/dftracer/utils/utilities/composites/dft/indexing/chunk_pruner_utility.h | Renames pruner input field to index_path |
| include/dftracer/utils/utilities/composites/dft/indexing/chunk_indexer_utility.h | Renames builder method to with_index_path |
| include/dftracer/utils/utilities/composites/dft/indexing/chunk_dimension_stats.h | Updates docs (no SQLite assumption) |
| include/dftracer/utils/utilities/composites/dft/indexing/bloom_filter_cache.h | Renames cache keying from idx_path to index_path |
| include/dftracer/utils/utilities/composites/dft/indexing/bloom_filter.h | Updates docs to RocksDB blob storage terminology |
| include/dftracer/utils/utilities/composites/dft/comparator/comparison_config.h | Updates docs to .dftindex terminology |
| include/dftracer/utils/utilities/composites/dft/chunk_extractor_utility.h | Updates chunk spec mapping to index_path |
| include/dftracer/utils/utilities/composites/dft/aggregators/chunk_aggregator_utility.h | Renames builder method to with_index_path |
| include/dftracer/utils/server/trace_index.h | Renames cached index path to .dftindex root terminology |
| include/dftracer/utils/core/sqlite/vfs.h | Removes SQLite VFS header |
| include/dftracer/utils/core/sqlite/statement.h | Removes SQLite statement wrapper header |
| include/dftracer/utils/core/sqlite/error.h | Removes SQLite error header |
| include/dftracer/utils/core/sqlite/database.h | Removes SQLite database header |
| include/dftracer/utils/core/runtime.h | Clears moved-from tasks post-await to reduce resource retention |
| include/dftracer/utils/core/rocksdb/key_codec.h | Declares RocksDB key codec + builder |
| include/dftracer/utils/core/rocksdb/filesystem.h | Declares DFTracer RocksDB filesystem/env helpers |
| include/dftracer/utils/core/rocksdb/db_manager.h | Adds process-wide RocksDB instance manager |
| include/dftracer/utils/core/rocksdb/database.h | Adds RocksDB DB wrapper API |
| include/dftracer/utils/core/pipeline/pipeline_config.h | Renames config sqlite_pool_size → db_pool_size |
| include/dftracer/utils/core/pipeline/executor.h | Renames sqlite pool to db pool; updates API name |
| include/dftracer/utils/core/io/io_backend.h | Adds callback-based pread API to backends |
| include/dftracer/utils/core/env.h | Declares Env helper + RocksDB tuning accessor |
| include/dftracer/utils/core/common/scoped_fd.h | Adds RAII fd wrapper for safe close on all paths |
| include/dftracer/utils/core/common/constants.h | Changes index extension constant to .dftindex |
| docs/source/utilities/indexer.rst | Updates Python examples for .dftindex persistence |
| docs/source/quickstart.rst | Updates quickstart to omit explicit .idx path |
| docs/source/installation.rst | Removes SQLite dependency from install steps |
| docs/source/cpp_api/index.rst | Removes sqlite from C++ API docs overview |
| docs/source/conf.py | Removes unused import |
| docs/source/api/indexer.rst | Updates Python Indexer signature docs to index_path |
| docs/scripts/generate_api_index.py | Formatting cleanup and minor refactors |
| cmake/modules/LibraryHelpers.cmake | Adds rpath emission for non-interface libraries |
| cmake/modules/InstallHelpers.cmake | Removes SQLite dependency handling |
| cmake/modules/CPM.cmake | Vendors CPM fallback and improves download error reporting |
| Makefile | Adds optional ty check in python test target |
| CMakeLists.txt | Adds ccache compiler launcher auto-detection |
| .github/workflows/python-publish.yaml | Updates action versions and cibuildwheel version |
| .github/workflows/format-check.yaml | Updates action versions and uv setup action |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
17c8cb6 to
41061bb
Compare
Replace SQLite-backed indexing and provenance storage with RocksDB-backed stores. Key changes: - add RocksDB async/database/db-manager/filesystem/key-codec layers - migrate index and provenance databases from SQLite to RocksDB - update index builder, trace reader, reorganize, view, stats, and comparator paths for RocksDB - harden transaction atomicity and rollback behavior with TransactionScope - add iterator status checking for prefix scans - harden gzip/tar indexer cache state and metadata handling - capture executor context in RocksDB awaitables - clean up failed RocksDB open paths and manager lifecycle behavior - vendor CPM 0.42.1 and update CI/build integration - refresh docs, Python bindings, and C++/Python test coverage for the new backend Validation: - full test suite passed - Ubuntu 22.04 Docker run passed - focused RocksDB/indexer regression tests passed.
hariharan-devarajan
approved these changes
Apr 6, 2026
This was referenced Apr 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Migrate SQLite Indexing to RocksDB
Summary
This PR migrates DFTracer indexing and provenance storage from SQLite sidecars to RocksDB-backed stores.
It also includes the follow-up correctness and recovery work needed to make the RocksDB path production-safe:
status()after prefix scans and covers that path with testsDB::Open()pathsKey Changes
Transaction correctness
IndexDatabase::delete_file_data()now respects active transaction batches for all deletes.IndexDatabase::get_or_create_file_info()no longer mixes immediate deletes with batched replacement writes.rollback_transaction()to index/provenance database wrappers.TransactionScopeRAII helper for rollback-on-exception behavior in:Iterator error handling
IndexDatabaseandProvenanceDatabasenow check iteratorstatus()after iteration.Cache/state hardening
GzipIndexerno longer uses0as an implicit “not cached” sentinel for metadata fields.TarIndexercache fields now use explicit state and synchronization instead of unsynchronized primitive members.RocksDB/runtime cleanup
RocksDatabase::open()now cleans up partially created handles.RocksDatabasenow uses immutable internal read/write options helpers rather than mutable per-instance options state.RocksDBManagerlifecycle coverage was expanded for reset/shutdown/upgrade semantics.Build/test cleanup
CPM.cmaketo avoid bootstrap download failures in CI.Testing
Passed locally:
Focused regression checks included:
utilities/indexer/test_rocksdb_storageutilities/indexer/test_scan_prefixutilities/indexer/test_index_databaseutilities/indexer/test_provenance_databaseutilities/indexer/test_index_builderbinaries/test_dftracer_infobinaries/test_dftracer_indexbinaries/test_dftracer_organizebinaries/test_dftracer_tarNotes
RocksDBManager::get_or_open()and some async capture rewrites was explicitly discarded because it caused broad runtime regressions. Those changes are not part of this PR.