Skip to content

feat:add big_key#3210

Open
chejinge wants to merge 10 commits intoOpenAtomFoundation:unstablefrom
chejinge:4.0_big_key
Open

feat:add big_key#3210
chejinge wants to merge 10 commits intoOpenAtomFoundation:unstablefrom
chejinge:4.0_big_key

Conversation

@chejinge
Copy link
Collaborator

@chejinge chejinge commented Dec 25, 2025

实现原理

存储架构

PikiwiDB unstable分支采用了新的统一存储架构:

  1. 单一RocksDB实例: 所有数据类型存储在同一个RocksDB实例中
  2. 多Column Family设计:
    • default (MetaCF): 存储所有key的元数据
    • hashes_data_cf: 存储hash字段数据
    • sets_data_cf: 存储set成员数据
    • lists_data_cf: 存储list元素数据
    • zsets_data_cf: 存储zset成员数据
    • zsets_score_cf: 存储zset分数索引
    • streams_data_cf: 存储stream数据

数据编码格式

1. Key编码

  • 使用EncodeUserKey将用户key编码为内部key
  • 编码后的key用于RocksDB存储,保证key的唯一性和排序

2. MetaCF中的Value格式

每种数据类型在MetaCF中都有对应的元数据:

Strings类型:

  • 使用ParsedStringsValue解析
  • 包含: 数据类型标识 + 实际value + 过期时间(etime)

复杂类型(Hash/Set/List/ZSet):

  • 使用ParsedXxxMetaValue解析
  • 包含: 数据类型标识 + 元素数量(count) + 版本号(version) + 过期时间(etime)

3. DataCF中的Key格式

复杂类型的数据key格式:

encoded_user_key + version + field/member/index

通过SeekUserkeyDelim函数可以提取出encoded_user_key部分

分析流程

String类型分析

  1. 遍历MetaCF中的所有记录
  2. 通过value首字节判断是否为String类型
  3. 使用ParsedStringsValue解析获取TTL等信息
  4. 计算大小 = encoded_key.size() + value.size()
  5. 过滤已过期的key

复杂类型分析(Hash/Set/List/ZSet)

采用两遍扫描策略:

第一遍 - 扫描MetaCF:

  1. 识别对应类型的元数据记录
  2. 使用ParsedXxxMetaValue解析
  3. 提取version, count, TTL等信息
  4. 记录元数据大小
  5. 过滤已过期或空集合

第二遍 - 扫描DataCF:

  1. 遍历对应的DataCF(如hashes_data_cf)
  2. 使用SeekUserkeyDelim从data_key中提取encoded_user_key
  3. 匹配第一遍中收集的元数据
  4. 累加每个field/member的大小到对应key的总大小

ZSet特殊处理:

  • 需要同时扫描zsets_data_cfzsets_score_cf
  • 因为ZSet维护了member->score和score->member两个索引

TTL计算

  1. 获取当前时间(毫秒): db->GetEnv()->GetCurrentTime(&curtime) * 1000
  2. 从ParsedValue中获取过期时间(etime)
  3. 如果etime > 0:
    • TTL(秒) = (etime - curtime) / 1000
    • 如果TTL <= 0,标记为已过期并跳过
  4. 如果etime == 0: TTL = -1(永久有效)

多DB实例支持

工具支持两种部署模式:

单实例模式:

  • db_path直接指向RocksDB目录
  • 检测方法: 目录下存在CURRENT文件

多实例模式(db_instance_num > 1):

  • db_path/db/0, db_path/db/1, db_path/db/2...
  • 自动检测并遍历所有db/N子目录
  • 聚合所有实例的分析结果

大小计算

每个key的大小包括:

  1. String: encoded_key + value的总大小
  2. 复杂类型: metadata_size + 所有field/member的key和value大小总和

这样计算反映了实际存储占用空间。

性能优化

  1. 使用迭代器顺序扫描: 避免随机读取
  2. unordered_map缓存元数据: 第一遍扫描后缓存,第二遍快速查找
  3. 只读模式打开: OpenForReadOnly避免加锁开销
  4. 智能过滤: 扫描时即过滤过期key,减少后续处理

注意事项

  • 工具只读取数据库,不会进行任何写操作
  • 大key的大小包括key和value的总大小
  • 已过期的key不会被包含在分析结果中
  • 分析过程不影响PikiwiDB正常运行,可以在线分析

Summary by CodeRabbit

  • New Features

    • Added a Big Key Analyzer CLI tool to scan RocksDB databases, report largest keys by type, compute prefix statistics, support min-size/top-N filters, show progress, and optionally export results.
  • Documentation

    • Added a README describing installation, CLI options, examples, output formats, and usage notes.
  • Chores

    • Integrated the analyzer into the project build so it's produced by standard builds.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions github-actions bot added ✏️ Feature New feature or request 📒 Documentation Improvements or additions to documentation labels Dec 25, 2025
@coderabbitai
Copy link

coderabbitai bot commented Dec 25, 2025

📝 Walkthrough

Walkthrough

Adds a new bigkey_analyzer tool: build integration, CMake target, Chinese README, and a C++ CLI that scans RocksDB instance(s) to report large keys, per-type sizes/TTLs, optional prefix statistics, and console/file output.

Changes

Cohort / File(s) Summary
Build graph
tools/CMakeLists.txt
Added bigkey_analyzer subdirectory to the tools build.
Tool build config
tools/bigkey_analyzer/CMakeLists.txt
New CMake target bigkey_analyzer (GNU C++), sets -std=c++17 and common warning flags, collects sources, adds PRIVATE include dirs, links storage and pthread, and sets runtime output dir.
Documentation
tools/bigkey_analyzer/README.md
New Chinese README describing features, CLI flags, examples, output format (big-key list, prefix stats, summary), single/multi‑DB behavior, and caveats.
Implementation
tools/bigkey_analyzer/bigkey_analyzer.cc
New executable and types: CLI parsing, DB discovery/opening, per‑type traversals (strings/hashes/lists/sets/zsets), KeyInfo/PrefixStat/Config structs, size & TTL computation, filtering (min-size, type), sorting/top‑N, prefix aggregation, and console/file output.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI
  participant Analyzer
  participant RocksDB as "RocksDB(s)"
  participant Output as "Console / File"

  Note over CLI,Analyzer: Parse flags (path, min-size, top, prefix, type, out)
  User->>CLI: run bigkey_analyzer
  CLI->>Analyzer: pass Config
  Analyzer->>RocksDB: open instance(s) & obtain CF handles
  RocksDB-->>Analyzer: iterate entries (key, value, TTL/meta)
  Analyzer->>Analyzer: decode keys, compute size/TTL, filter, collect KeyInfo
  Analyzer->>Analyzer: sort results, truncate to top‑N, compute prefix stats
  Analyzer->>Output: emit big-key list, prefix statistics, summary
  Output-->>User: display or write file
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hop through SSTs at dawn,
I sniff the keys till morning's yawn,
I flag the giants, stash their name,
I count their prefixes—what a game!
Tiny paws find bloat, then clap—hooray!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'feat:add big_key' is related to the changeset but is generic and vague; it lacks specificity about what the tool does or its primary purpose. Consider a more descriptive title like 'feat: add BigKey analyzer tool for RocksDB-backed databases' to better convey the feature's purpose.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71a65d6 and fee8036.

📒 Files selected for processing (1)
  • tools/bigkey_analyzer/bigkey_analyzer.cc
🧰 Additional context used
🧬 Code graph analysis (1)
tools/bigkey_analyzer/bigkey_analyzer.cc (7)
src/rsync_client.cc (1)
  • str (432-432)
src/storage/src/util.cc (1)
  • t (52-52)
src/net/src/dispatch_thread.h (1)
  • k (37-39)
src/pstd/src/env.cc (1)
  • move (367-367)
src/storage/include/storage/storage_define.h (1)
  • SeekUserkeyDelim (122-132)
src/storage/src/type_iterator.h (1)
  • status (401-408)
src/storage/tests/lists_filter_test.cc (1)
  • handle (54-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Analyze (go)
  • GitHub Check: build_on_ubuntu
  • GitHub Check: build_on_macos
  • GitHub Check: build_on_centos
🔇 Additional comments (1)
tools/bigkey_analyzer/bigkey_analyzer.cc (1)

738-896: LGTM! Database detection logic is thorough.

The implementation correctly uses FileExists to check for the CURRENT file (lines 738, 804, 834, 861, 878), which addresses the previous issue about incorrectly treating CURRENT as a directory. The multi-format database detection logic comprehensively handles various layout patterns (single DB, dbN/M format, direct partition directories, classic db/N format) with helpful debug output.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (4)
tools/bigkey_analyzer/bigkey_analyzer.cc (4)

631-669: Consider using RAII for resource management.

The function uses manual memory management with raw pointers for the database and column family handles. If an exception occurs during analysis (lines 644-663), the cleanup code (lines 666-669) won't execute, leading to memory leaks.

🔎 Suggested approach using RAII

Consider wrapping the database and handles in smart pointers with custom deleters:

// After opening the database successfully
std::unique_ptr<rocksdb::DB, void(*)(rocksdb::DB*)> db_ptr(db, [&handles](rocksdb::DB* d) {
  for (auto handle : handles) {
    delete handle;
  }
  delete d;
});

Alternatively, use a simpler scope guard pattern or ensure exception safety through other RAII mechanisms.


67-90: Consider simplifying KeyInfo constructors.

The struct defines five different constructors to handle various combinations of const references, rvalue references, and const char pointers. Modern C++ typically relies on fewer constructors with aggregate initialization or designated initializers.

💡 Simplified alternative
struct KeyInfo {
  std::string type;
  std::string key;
  int64_t size;
  int64_t ttl;
  
  bool operator<(const KeyInfo& other) const {
    return size > other.size; // Sort in descending order by size
  }
};

// Usage with aggregate initialization:
key_infos.emplace_back(KeyInfo{"string", std::move(display_key), size, ttl});

299-312: Consider extracting repeated key parsing logic.

The pattern of extracting the encoded user key from data keys using SeekUserkeyDelim is repeated six times across the codebase (in AnalyzeHashes, AnalyzeSets, AnalyzeZsets twice, and AnalyzeLists). Extracting this into a helper function would reduce duplication and improve maintainability.

💡 Example helper function
// Returns empty string if extraction fails
std::string ExtractEncodedUserKey(const rocksdb::Slice& data_key) {
  const char* ptr = storage::SeekUserkeyDelim(data_key.data(), data_key.size());
  size_t user_key_len = ptr - data_key.data();
  
  if (user_key_len == 0 || user_key_len > data_key.size()) {
    return "";
  }
  
  return std::string(data_key.data(), user_key_len);
}

Then use it consistently:

std::string encoded_user_key = ExtractEncodedUserKey(data_key);
if (encoded_user_key.empty()) continue;

auto it = hash_info.find(encoded_user_key);

37-44: ReplaceAll function has potential performance considerations.

The current implementation calls str.replace in a loop, which can cause multiple memory reallocations for strings with many occurrences. For the current use case (escaping newlines and spaces in key names), this is likely acceptable.

If performance becomes a concern with very large keys or many replacements, consider a single-pass approach:

💡 Alternative single-pass implementation
std::string ReplaceAll(const std::string& str, const std::string& from, const std::string& to) {
  if (from.empty()) return str;
  
  std::string result;
  result.reserve(str.size()); // Pre-allocate approximate size
  
  size_t start_pos = 0;
  size_t found_pos;
  
  while ((found_pos = str.find(from, start_pos)) != std::string::npos) {
    result.append(str, start_pos, found_pos - start_pos);
    result.append(to);
    start_pos = found_pos + from.length();
  }
  result.append(str, start_pos, str.length() - start_pos);
  
  return result;
}
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac6e848 and ed4f523.

📒 Files selected for processing (4)
  • tools/CMakeLists.txt
  • tools/bigkey_analyzer/CMakeLists.txt
  • tools/bigkey_analyzer/README.md
  • tools/bigkey_analyzer/bigkey_analyzer.cc
🧰 Additional context used
🧬 Code graph analysis (1)
tools/bigkey_analyzer/bigkey_analyzer.cc (1)
src/storage/include/storage/storage_define.h (1)
  • SeekUserkeyDelim (122-132)
🪛 GitHub Actions: Pika
tools/bigkey_analyzer/CMakeLists.txt

[error] 20-20: CMake error: Target "rocksdb" of type UTILITY may not be linked into another target. One may link only to INTERFACE, OBJECT, STATIC or SHARED libraries, or to executables with the ENABLE_EXPORTS property set.


[warning] 116-116: couldn't find clang-tidy.


[warning] 126-126: couldn't find clang-apply-replacements.


[warning] 797-797: configure_file called with unknown argument(s): src/pika_cache_load_thread.cc

tools/CMakeLists.txt

[warning] 116-116: couldn't find clang-tidy.


[warning] 126-126: couldn't find clang-apply-replacements.


[warning] 797-797: configure_file called with unknown argument(s): src/pika_cache_load_thread.cc

🪛 markdownlint-cli2 (0.18.1)
tools/bigkey_analyzer/README.md

29-29: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


93-93: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
tools/CMakeLists.txt (1)

3-3: LGTM!

The addition of the bigkey_analyzer subdirectory follows the established pattern and correctly integrates the new tool into the build system.

tools/bigkey_analyzer/bigkey_analyzer.cc (1)

692-722: Well-designed database detection logic.

The code properly handles both single database instances and multi-database setups (db/0, db/1, ...) by checking for the standard RocksDB CURRENT marker file. The detection logic is clear and includes appropriate error messages.

Comment on lines +143 to +148
case 'm':
config.min_size = std::stoll(optarg);
break;
case 't':
config.top_n = std::stoi(optarg);
break;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add exception handling for numeric conversions.

The calls to std::stoll and std::stoi can throw std::invalid_argument or std::out_of_range exceptions when users provide invalid input (e.g., --min-size=abc or --top=xyz). Without exception handling, the program will terminate ungracefully.

🔎 Proposed fix with exception handling
     switch (opt) {
       case 'm':
-        config.min_size = std::stoll(optarg);
+        try {
+          config.min_size = std::stoll(optarg);
+        } catch (const std::exception& e) {
+          std::cerr << "Error: Invalid value for --min-size: " << optarg << std::endl;
+          return false;
+        }
         break;
       case 't':
-        config.top_n = std::stoi(optarg);
+        try {
+          config.top_n = std::stoi(optarg);
+        } catch (const std::exception& e) {
+          std::cerr << "Error: Invalid value for --top: " << optarg << std::endl;
+          return false;
+        }
         break;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
case 'm':
config.min_size = std::stoll(optarg);
break;
case 't':
config.top_n = std::stoi(optarg);
break;
case 'm':
try {
config.min_size = std::stoll(optarg);
} catch (const std::exception& e) {
std::cerr << "Error: Invalid value for --min-size: " << optarg << std::endl;
return false;
}
break;
case 't':
try {
config.top_n = std::stoi(optarg);
} catch (const std::exception& e) {
std::cerr << "Error: Invalid value for --top: " << optarg << std::endl;
return false;
}
break;
🤖 Prompt for AI Agents
In tools/bigkey_analyzer/bigkey_analyzer.cc around lines 143-148, the direct
calls to std::stoll(optarg) and std::stoi(optarg) can throw
std::invalid_argument or std::out_of_range for bad user input; wrap each
conversion in a try-catch that catches these exceptions, print a clear error to
std::cerr (including the offending optarg and which flag failed), and exit with
a non-zero code (or return from main) so the program fails gracefully;
alternatively, validate the string contains only digits (and optional leading
sign) before converting and handle out-of-range by limiting/validating ranges
and reporting the error to the user.

Comment on lines +191 to +193
int64_t curtime;
db->GetEnv()->GetCurrentTime(&curtime).ok();
curtime *= 1000; // Convert to milliseconds
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle GetCurrentTime failure to prevent incorrect TTL calculations.

The return value of GetCurrentTime is checked with .ok() but the result is discarded. If the call fails, curtime remains uninitialized, leading to incorrect TTL calculations throughout the function. This same pattern appears in other analyze functions.

🔎 Proposed fix
   int64_t curtime;
-  db->GetEnv()->GetCurrentTime(&curtime).ok();
+  if (!db->GetEnv()->GetCurrentTime(&curtime).ok()) {
+    std::cerr << "Error: Failed to get current time" << std::endl;
+    return;
+  }
   curtime *= 1000; // Convert to milliseconds

Apply similar fixes to AnalyzeHashes (line 251), AnalyzeSets (line 333), AnalyzeZsets (line 408), and AnalyzeLists (line 503).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
int64_t curtime;
db->GetEnv()->GetCurrentTime(&curtime).ok();
curtime *= 1000; // Convert to milliseconds
int64_t curtime;
if (!db->GetEnv()->GetCurrentTime(&curtime).ok()) {
std::cerr << "Error: Failed to get current time" << std::endl;
return;
}
curtime *= 1000; // Convert to milliseconds
🤖 Prompt for AI Agents
In tools/bigkey_analyzer/bigkey_analyzer.cc around lines 191-193,
GetCurrentTime()'s Status is being discarded so curtime may remain uninitialized
on failure; change the code to capture the returned Status, check if
status.ok(), and if not, handle the error (e.g., log or return) and set curtime
to a safe default (0) before converting to milliseconds to avoid incorrect TTL
calculations; apply the same pattern to the corresponding calls in AnalyzeHashes
(around line 251), AnalyzeSets (around line 333), AnalyzeZsets (around line
408), and AnalyzeLists (around line 503).

Comment on lines +29 to +39
```
Usage: bigkey_analyzer [OPTIONS] <db_path>
Options:
--min-size=SIZE Only show keys larger than SIZE bytes
--top=N Only show top N largest keys
--prefix-stat Show statistics by key prefix
--prefix-delimiter=C Character used to delimit prefix (default: ':')
--type=TYPE Only analyze specific type (strings|hashes|lists|sets|zsets|all)
--output=FILE Write output to file instead of stdout
--help Display this help message
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifier to fenced code block.

The usage text block should specify a language identifier for proper syntax highlighting and better readability.

🔎 Proposed fix
-```
+```text
 Usage: bigkey_analyzer [OPTIONS] <db_path>
 Options:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
Usage: bigkey_analyzer [OPTIONS] <db_path>
Options:
--min-size=SIZE Only show keys larger than SIZE bytes
--top=N Only show top N largest keys
--prefix-stat Show statistics by key prefix
--prefix-delimiter=C Character used to delimit prefix (default: ':')
--type=TYPE Only analyze specific type (strings|hashes|lists|sets|zsets|all)
--output=FILE Write output to file instead of stdout
--help Display this help message
```
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

29-29: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In tools/bigkey_analyzer/README.md around lines 29 to 39, the fenced code block
with the usage text lacks a language identifier; update the opening fence from
``` to ```text so it becomes ```text and leave the block content unchanged,
ensuring the usage snippet is syntax-highlighted as plain text in renders.

Comment on lines +93 to +114
```
===== Big Key Analysis =====
Type Size Key TTL
hash 1048576 user:profile:1001 -1
zset 524288 ranking:global 3600
string 262144 config:settings -1
...

===== Key Prefix Statistics =====
Prefix Count Total Size Avg Size
user 100 10485760 104857.6
ranking 50 2621440 52428.8
config 10 524288 52428.8
...

===== Summary =====
Total keys analyzed: 160
Keys by type:
hash: 50 keys, 25.0 MB total, 524288.0 bytes avg
zset: 30 keys, 15.0 MB total, 524288.0 bytes avg
string: 80 keys, 10.0 MB total, 131072.0 bytes avg
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifier to fenced code block.

The example output block should specify a language identifier for proper syntax highlighting and better readability.

🔎 Proposed fix
-```
+```text
 ===== Big Key Analysis =====
 Type    Size    Key     TTL
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

93-93: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In tools/bigkey_analyzer/README.md around lines 93 to 114, the fenced code block
lacks a language identifier so it doesn’t get proper syntax highlighting; update
the opening fence to include an appropriate language token (for example "text")
— i.e. change ``` to ```text — and ensure the closing fence remains unchanged so
the block renders with the specified language.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
tools/bigkey_analyzer/CMakeLists.txt (2)

1-4: Consider modern CMake multi-line string syntax.

While the backslash line continuations are valid, modern CMake style prefers unquoted multi-line arguments without backslashes for improved readability:

set(WARNING_FLAGS 
    -W -Wextra -Wall -Wsign-compare
    -Wno-unused-parameter -Wno-redundant-decls -Wwrite-strings
    -Wpointer-arith -Wreorder -Wswitch -Wsign-promo
    -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers
)

8-9: Prefer explicit source file lists over aux_source_directory.

aux_source_directory is discouraged in modern CMake because it prevents automatic reconfiguration when files are added/removed and obscures which files contribute to the build.

🔎 Recommended approach
-set(SRC_DIR .)
-aux_source_directory(${SRC_DIR} BASE_OBJS)
+set(BASE_OBJS
+    bigkey_analyzer.cc
+    # Add other source files explicitly
+)

Alternatively, if you must glob, use CONFIGURE_DEPENDS:

file(GLOB BASE_OBJS CONFIGURE_DEPENDS "*.cc")
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed4f523 and 5488d60.

📒 Files selected for processing (1)
  • tools/bigkey_analyzer/CMakeLists.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Analyze (go)
  • GitHub Check: build_on_macos
  • GitHub Check: build_on_centos
  • GitHub Check: build_on_ubuntu

Comment on lines +20 to +23
target_link_libraries(bigkey_analyzer
storage
pthread
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check RocksDB linking in storage library and other tools

echo "=== Check storage library's RocksDB dependency linking ==="
find . -path ./tools -prune -o -name "CMakeLists.txt" -type f -print | grep -E "src/storage|storage" | while read f; do
  if [ -f "$f" ]; then
    echo "File: $f"
    cat -n "$f" | grep -A 10 -B 2 "target_link_libraries"
  fi
done

echo -e "\n=== Check CMakeLists.txt files in tools directory ==="
find tools -name "CMakeLists.txt" -type f | while read f; do
  echo -e "\n--- $f ---"
  cat -n "$f" | grep -A 8 -B 1 "target_link_libraries"
done

Repository: OpenAtomFoundation/pikiwidb

Length of output: 10408


Add explicit ${ROCKSDB_LIBRARY} link for consistency with other tools.

Although storage provides ${ROCKSDB_LIBRARY} as a PUBLIC dependency, all similar tools in the project—including txt_to_pika, pika_to_txt, benchmark_client, and pika_port—explicitly link ${ROCKSDB_LIBRARY} alongside storage. Update to:

target_link_libraries(bigkey_analyzer 
    storage
    ${ROCKSDB_LIBRARY}
    pthread
)
🤖 Prompt for AI Agents
In tools/bigkey_analyzer/CMakeLists.txt around lines 20 to 23,
target_link_libraries only lists storage and pthread but other tools explicitly
link ${ROCKSDB_LIBRARY}; add ${ROCKSDB_LIBRARY} to the target_link_libraries
call so it becomes: target_link_libraries(bigkey_analyzer storage
${ROCKSDB_LIBRARY} pthread) to match project consistency.

Comment on lines +25 to +29
set_target_properties(bigkey_analyzer PROPERTIES
RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}
CMAKE_COMPILER_IS_GNUCXX TRUE
COMPILE_FLAGS ${CXXFLAGS}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Replace deprecated COMPILE_FLAGS and remove incorrect CMAKE_COMPILER_IS_GNUCXX property.

Two issues:

  1. CMAKE_COMPILER_IS_GNUCXX is a CMake-detected variable, not a settable target property. Setting it here has no effect.
  2. COMPILE_FLAGS property has been deprecated since CMake 3.0 in favor of target_compile_options().
🔎 Recommended fix
+target_compile_options(bigkey_analyzer PRIVATE ${WARNING_FLAGS})
+target_compile_features(bigkey_analyzer PRIVATE cxx_std_17)
+
 set_target_properties(bigkey_analyzer PROPERTIES
     RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}
-    CMAKE_COMPILER_IS_GNUCXX TRUE
-    COMPILE_FLAGS ${CXXFLAGS}
 )

Also remove or simplify the CXXFLAGS variable at line 6 since flags are now applied via target_compile_options and C++17 via target_compile_features.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tools/bigkey_analyzer/CMakeLists.txt around lines 25-29, the target
properties block incorrectly sets CMAKE_COMPILER_IS_GNUCXX (a CMake-detected
variable that should not be set) and uses the deprecated COMPILE_FLAGS property;
remove the CMAKE_COMPILER_IS_GNUCXX line entirely, drop the COMPILE_FLAGS
property, and instead apply any desired flags with
target_compile_options(bigkey_analyzer PRIVATE ${CXXFLAGS}) and declare required
C++ standard via target_compile_features(bigkey_analyzer PRIVATE cxx_std_17);
also remove or simplify the CXXFLAGS variable at line 6 since flags are now
applied via target_compile_options and C++17 via target_compile_features.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (6)
tools/bigkey_analyzer/bigkey_analyzer.cc (6)

144-149: Add exception handling for numeric conversions.

This is a duplicate of a previously raised issue. The calls to std::stoll and std::stoi can throw std::invalid_argument or std::out_of_range exceptions for invalid input. Without exception handling, the program will terminate ungracefully.


192-194: Handle GetCurrentTime failure to prevent incorrect TTL calculations.

This is a duplicate of a previously raised issue. If GetCurrentTime fails, curtime remains uninitialized, leading to incorrect TTL calculations.


251-253: Handle GetCurrentTime failure to prevent incorrect TTL calculations.

This is a duplicate of a previously raised issue. The same GetCurrentTime handling problem exists here.


333-335: Handle GetCurrentTime failure to prevent incorrect TTL calculations.

This is a duplicate of a previously raised issue.


408-410: Handle GetCurrentTime failure to prevent incorrect TTL calculations.

This is a duplicate of a previously raised issue.


504-506: Handle GetCurrentTime failure to prevent incorrect TTL calculations.

This is a duplicate of a previously raised issue.

🧹 Nitpick comments (2)
tools/bigkey_analyzer/bigkey_analyzer.cc (2)

136-140: Remove redundant initialization.

These assignments duplicate the default member initializers already defined in the Config struct (lines 107-111). While not harmful, removing them reduces redundancy.

🔎 Proposed cleanup
   int opt;
   int option_index = 0;
   
-  config.min_size = 0;
-  config.top_n = -1;
-  config.prefix_stat = false;
-  config.prefix_delimiter = ":";
-  config.type_filter = "all";
-  
   while ((opt = getopt_long(argc, argv, "m:t:pd:y:o:h", long_options, &option_index)) != -1) {

187-571: Consider refactoring duplicate analysis logic.

The five Analyze* functions (AnalyzeStrings, AnalyzeHashes, AnalyzeSets, AnalyzeZsets, AnalyzeLists) share substantial structural similarity:

  • GetCurrentTime call
  • Metadata scanning with type filtering
  • Staleness and count checks
  • TTL calculation
  • Size accumulation

While each type requires slightly different parsing logic and column family access, extracting common patterns into helper functions or templates could reduce duplication and improve maintainability. Given this is a diagnostic tool where clarity is valued, this refactoring is recommended but not essential.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5488d60 and 10b8a9a.

📒 Files selected for processing (1)
  • tools/bigkey_analyzer/bigkey_analyzer.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Analyze (go)
  • GitHub Check: build_on_macos
  • GitHub Check: build_on_centos
  • GitHub Check: build_on_ubuntu

Comment on lines +302 to +305
const char* ptr = storage::SeekUserkeyDelim(data_key.data(), data_key.size());
size_t user_key_len = ptr - data_key.data();

if (user_key_len == 0 || user_key_len > data_key.size()) continue;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Strengthen validation for SeekUserkeyDelim result.

SeekUserkeyDelim has a TODO comment in its implementation about handling invalid formats, and it may return an invalid pointer if the key format is malformed. Line 303 performs pointer arithmetic (ptr - data_key.data()) that could produce undefined behavior if ptr points outside the valid buffer range. While Line 305 checks user_key_len > data_key.size(), this doesn't fully protect against ptr being arbitrary.

Consider adding a check that ptr is within bounds before computing user_key_len:

🔎 Proposed fix with bounds checking
     const char* ptr = storage::SeekUserkeyDelim(data_key.data(), data_key.size());
+    if (ptr < data_key.data() || ptr > data_key.data() + data_key.size()) {
+      continue; // Invalid pointer, skip this key
+    }
     size_t user_key_len = ptr - data_key.data();
     
     if (user_key_len == 0 || user_key_len > data_key.size()) continue;

Apply the same fix to AnalyzeSets (line 377), AnalyzeZsets (lines 454, 474), and AnalyzeLists (line 549).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const char* ptr = storage::SeekUserkeyDelim(data_key.data(), data_key.size());
size_t user_key_len = ptr - data_key.data();
if (user_key_len == 0 || user_key_len > data_key.size()) continue;
const char* ptr = storage::SeekUserkeyDelim(data_key.data(), data_key.size());
if (ptr < data_key.data() || ptr > data_key.data() + data_key.size()) {
continue; // Invalid pointer, skip this key
}
size_t user_key_len = ptr - data_key.data();
if (user_key_len == 0 || user_key_len > data_key.size()) continue;
🤖 Prompt for AI Agents
In tools/bigkey_analyzer/bigkey_analyzer.cc around lines 302-305, the code
computes user_key_len using ptr returned from SeekUserkeyDelim without first
validating that ptr lies within the data_key buffer, which can cause undefined
behaviour for malformed keys; before doing pointer arithmetic, verify ptr is
non-null and that ptr >= data_key.data() && ptr <= data_key.data() +
data_key.size() (or ptr < end depending on intended delimiter semantics), and if
the check fails continue; apply the same pointer-bounds validation pattern to
the corresponding locations in AnalyzeSets (around line 377), AnalyzeZsets
(around lines 454 and 474), and AnalyzeLists (around line 549).

Comment on lines +636 to +648
rocksdb::Status status = rocksdb::DB::OpenForReadOnly(db_options, db_path,
column_families, &handles, &db);

if (!status.ok()) {
std::cerr << "Error opening database at " << db_path << ": " << status.ToString() << std::endl;
return;
}

std::cout << "Analyzing database at " << db_path << std::endl;

// Analyze each type
if (config.type_filter == "all" || config.type_filter == "strings") {
AnalyzeStrings(db, handles[storage::kMetaCF], key_infos, config);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Verify column family handles before access.

If the database is missing expected column families, OpenForReadOnly may succeed but return fewer handles than expected. Accessing handles[storage::kMetaCF] or other indexes without verifying handles.size() could cause out-of-bounds access.

🔎 Proposed fix with handle count verification
   rocksdb::Status status = rocksdb::DB::OpenForReadOnly(db_options, db_path, 
                                                          column_families, &handles, &db);
   
   if (!status.ok()) {
     std::cerr << "Error opening database at " << db_path << ": " << status.ToString() << std::endl;
     return;
   }
+  
+  if (handles.size() != cf_names.size()) {
+    std::cerr << "Error: Expected " << cf_names.size() << " column families but got " 
+              << handles.size() << std::endl;
+    for (auto handle : handles) {
+      delete handle;
+    }
+    delete db;
+    return;
+  }
   
   std::cout << "Analyzing database at " << db_path << std::endl;

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
tools/bigkey_analyzer/bigkey_analyzer.cc (6)

48-52: Add error handling or validation for DecodeUserKey to handle malformed encoded keys.

The storage::DecodeUserKey function has no error indication mechanism, and this wrapper at lines 48–52 ignores any potential corruption for malformed input (e.g., missing delimiter). This can lead to silent data corruption if encoded keys lack the expected format. The function is called at lines 212, 321, 396, 493, and 568.


144-149: Add exception handling for numeric conversions.

The calls to std::stoll (line 145) and std::stoi (line 148) can throw std::invalid_argument or std::out_of_range exceptions when users provide invalid input (e.g., --min-size=abc or --top=xyz). Without exception handling, the program will terminate ungracefully.

🔎 Proposed fix with exception handling
     switch (opt) {
       case 'm':
-        config.min_size = std::stoll(optarg);
+        try {
+          config.min_size = std::stoll(optarg);
+        } catch (const std::exception& e) {
+          std::cerr << "Error: Invalid value for --min-size: " << optarg << std::endl;
+          return false;
+        }
         break;
       case 't':
-        config.top_n = std::stoi(optarg);
+        try {
+          config.top_n = std::stoi(optarg);
+        } catch (const std::exception& e) {
+          std::cerr << "Error: Invalid value for --top: " << optarg << std::endl;
+          return false;
+        }
         break;

192-194: Handle GetCurrentTime failure to prevent incorrect TTL calculations.

The return value of GetCurrentTime is checked with .ok() but the result is discarded. If the call fails, curtime remains uninitialized, leading to incorrect TTL calculations. This same pattern appears in AnalyzeHashes (line 253), AnalyzeSets (line 336), AnalyzeZsets (line 412), and AnalyzeLists (line 508).

🔎 Proposed fix
   int64_t curtime;
-  db->GetEnv()->GetCurrentTime(&curtime).ok();
+  if (!db->GetEnv()->GetCurrentTime(&curtime).ok()) {
+    std::cerr << "Error: Failed to get current time" << std::endl;
+    return;
+  }
   curtime *= 1000; // Convert to milliseconds

Apply similar fixes to AnalyzeHashes (line 253), AnalyzeSets (line 336), AnalyzeZsets (line 412), and AnalyzeLists (line 508).


304-307: Strengthen validation for SeekUserkeyDelim result.

SeekUserkeyDelim may return an invalid pointer if the key format is malformed (the implementation has a TODO about this). Line 305 performs pointer arithmetic (ptr - data_key.data()) that could produce undefined behavior if ptr points outside the valid buffer range. While line 307 checks user_key_len > data_key.size(), this doesn't fully protect against ptr being arbitrary.

🔎 Proposed fix with bounds checking
     const char* ptr = storage::SeekUserkeyDelim(data_key.data(), data_key.size());
+    if (ptr < data_key.data() || ptr > data_key.data() + data_key.size()) {
+      continue; // Invalid pointer, skip this key
+    }
     size_t user_key_len = ptr - data_key.data();
     
     if (user_key_len == 0 || user_key_len > data_key.size()) continue;

Apply the same fix to AnalyzeSets (line 380), AnalyzeZsets (lines 457, 477), and AnalyzeLists (line 552).


639-651: Verify column family handles before access.

If the database is missing expected column families, OpenForReadOnly may succeed but return fewer handles than expected. Accessing handles[storage::kMetaCF] (line 651) or other indexes without verifying handles.size() could cause out-of-bounds access.

🔎 Proposed fix with handle count verification
   rocksdb::Status status = rocksdb::DB::OpenForReadOnly(db_options, db_path, 
                                                          column_families, &handles, &db);
   
   if (!status.ok()) {
     std::cerr << "Error opening database at " << db_path << ": " << status.ToString() << std::endl;
     return;
   }
+  
+  if (handles.size() != cf_names.size()) {
+    std::cerr << "Error: Expected " << cf_names.size() << " column families but got " 
+              << handles.size() << std::endl;
+    for (auto handle : handles) {
+      delete handle;
+    }
+    delete db;
+    return;
+  }
   
   std::cout << "Analyzing database at " << db_path << std::endl;

704-704: CURRENT is a file, not a directory.

Line 704 checks DirectoryExists(test_path + "/CURRENT"), but CURRENT is a RocksDB metadata file, not a directory. This check will always fail even when a valid single database exists. The same issue occurs at line 713.

🔎 Proposed fix to check for file existence

First, add a helper function to check file existence:

bool FileExists(const std::string& path) {
  struct stat st;
  return stat(path.c_str(), &st) == 0 && S_ISREG(st.st_mode);
}

Then fix lines 704 and 713:

-  if (DirectoryExists(test_path + "/CURRENT")) {
+  if (FileExists(test_path + "/CURRENT")) {
     // This is a single database instance
     db_paths.push_back(test_path);
     std::cout << "Detected single database instance" << std::endl;
-      if (DirectoryExists(db_inst_path) && DirectoryExists(db_inst_path + "/CURRENT")) {
+      if (DirectoryExists(db_inst_path) && FileExists(db_inst_path + "/CURRENT")) {
         db_paths.push_back(db_inst_path);
         db_index++;
🧹 Nitpick comments (1)
tools/bigkey_analyzer/bigkey_analyzer.cc (1)

296-315: Add iterator status checks after data scans.

While AnalyzeStrings checks iterator status at lines 241-243, the other analyze functions (AnalyzeHashes, AnalyzeSets, AnalyzeZsets, AnalyzeLists) don't verify iterator status after scanning data column families. Missing status checks could hide corruption or I/O errors during iteration.

🔎 Proposed fix pattern

Add status checks after each data iterator loop. For AnalyzeHashes (after line 315):

   }
+  
+  if (!data_iter->status().ok()) {
+    std::cerr << "Error iterating hash data: " << data_iter->status().ToString() << std::endl;
+  }
   
   // Add results

Apply similar checks to:

  • AnalyzeSets after line 391
  • AnalyzeZsets after line 468 (data_iter) and line 488 (score_iter)
  • AnalyzeLists after line 563
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10b8a9a and 23668f3.

📒 Files selected for processing (1)
  • tools/bigkey_analyzer/bigkey_analyzer.cc
🧰 Additional context used
🧬 Code graph analysis (1)
tools/bigkey_analyzer/bigkey_analyzer.cc (1)
src/storage/include/storage/storage_define.h (1)
  • SeekUserkeyDelim (122-132)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Analyze (go)
  • GitHub Check: build_on_ubuntu
  • GitHub Check: build_on_macos
  • GitHub Check: build_on_centos

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
tools/bigkey_analyzer/bigkey_analyzer.cc (6)

146-151: Add exception handling for numeric conversions.

The calls to std::stoll and std::stoi can throw std::invalid_argument or std::out_of_range exceptions for invalid user input (e.g., --min-size=abc). Without try-catch blocks, the program will terminate ungracefully.

🔎 Proposed fix with exception handling
     switch (opt) {
       case 'm':
-        config.min_size = std::stoll(optarg);
+        try {
+          config.min_size = std::stoll(optarg);
+        } catch (const std::exception& e) {
+          std::cerr << "Error: Invalid value for --min-size: " << optarg << std::endl;
+          return false;
+        }
         break;
       case 't':
-        config.top_n = std::stoi(optarg);
+        try {
+          config.top_n = std::stoi(optarg);
+        } catch (const std::exception& e) {
+          std::cerr << "Error: Invalid value for --top: " << optarg << std::endl;
+          return false;
+        }
         break;

194-196: Handle GetCurrentTime failure to prevent incorrect TTL calculations.

GetCurrentTime's return status is checked with .ok() but the result is discarded. If the call fails, curtime remains uninitialized, leading to incorrect TTL calculations. This pattern appears in all five analyze functions.

🔎 Proposed fix
   int64_t curtime;
-  db->GetEnv()->GetCurrentTime(&curtime).ok();
+  if (!db->GetEnv()->GetCurrentTime(&curtime).ok()) {
+    std::cerr << "Error: Failed to get current time" << std::endl;
+    return;
+  }
   curtime *= 1000; // Convert to milliseconds

Apply this fix to:

  • AnalyzeStrings (lines 194-196)
  • AnalyzeHashes (lines 254-256)
  • AnalyzeSets (lines 337-339)
  • AnalyzeZsets (lines 413-415)
  • AnalyzeLists (lines 511-513)

Also applies to: 254-256, 337-339, 413-415, 511-513


50-54: DecodeUserKey has no error indication mechanism.

storage::DecodeUserKey silently produces incorrect output for malformed encoded keys (e.g., missing delimiter). The wrapper at lines 50-52 returns the potentially corrupted user_key to all five call sites (lines 214, 323, 398, 497, 574), which can lead to silent data corruption in the output.


306-309: Strengthen validation for SeekUserkeyDelim result.

SeekUserkeyDelim may return an invalid pointer for malformed keys. The pointer arithmetic ptr - data_key.data() at lines 307, 383, and 460 could produce undefined behavior if ptr points outside the valid buffer range. While the code checks user_key_len > data_key.size(), this doesn't protect against ptr being arbitrary.

🔎 Proposed fix with bounds checking
     const char* ptr = storage::SeekUserkeyDelim(data_key.data(), data_key.size());
+    if (ptr < data_key.data() || ptr > data_key.data() + data_key.size()) {
+      continue; // Invalid pointer, skip this key
+    }
     size_t user_key_len = ptr - data_key.data();
     
     if (user_key_len == 0 || user_key_len > data_key.size()) continue;

Apply to AnalyzeHashes (line 306), AnalyzeSets (line 382), and AnalyzeZsets (line 459).

Also applies to: 382-385, 459-462


645-651: Verify column family handles before access.

If the database is missing expected column families, OpenForReadOnly may succeed but return fewer handles than expected. Accessing handles[storage::kMetaCF] or other indexes at lines 657, 661, 665, 669-670, and 674 without verifying handles.size() could cause out-of-bounds access.

🔎 Proposed fix with handle count verification
   rocksdb::Status status = rocksdb::DB::OpenForReadOnly(db_options, db_path, 
                                                          column_families, &handles, &db);
   
   if (!status.ok()) {
     std::cerr << "Error opening database at " << db_path << ": " << status.ToString() << std::endl;
     return;
   }
+  
+  if (handles.size() != cf_names.size()) {
+    std::cerr << "Error: Expected " << cf_names.size() << " column families but got " 
+              << handles.size() << std::endl;
+    for (auto handle : handles) {
+      delete handle;
+    }
+    delete db;
+    return;
+  }
   
   std::cout << "Analyzing database at " << db_path << std::endl;

710-710: CURRENT is a file, not a directory.

Lines 710 and 719 check DirectoryExists(path + "/CURRENT"), but CURRENT is a RocksDB metadata file, not a directory. These checks will always fail even when a valid database exists.

🔎 Proposed fix to check for file existence

First, add a helper function after DirectoryExists:

bool FileExists(const std::string& path) {
  struct stat st;
  return stat(path.c_str(), &st) == 0 && S_ISREG(st.st_mode);
}

Then fix lines 710 and 719:

-  if (DirectoryExists(test_path + "/CURRENT")) {
+  if (FileExists(test_path + "/CURRENT")) {
     // This is a single database instance
     db_paths.push_back(test_path);
     std::cout << "Detected single database instance" << std::endl;
-      if (DirectoryExists(db_inst_path) && DirectoryExists(db_inst_path + "/CURRENT")) {
+      if (DirectoryExists(db_inst_path) && FileExists(db_inst_path + "/CURRENT")) {
         db_paths.push_back(db_inst_path);
         db_index++;

Also applies to: 719-719

🧹 Nitpick comments (4)
tools/bigkey_analyzer/bigkey_analyzer.cc (4)

155-157: Consider validating prefix_delimiter.

Users can specify an empty string as the prefix delimiter via --prefix-delimiter="". At line 584, GetKeyPrefix uses key.find(delimiter). An empty delimiter would cause find to return 0, making all prefixes empty strings. This might not be the intended behavior.

🔎 Proposed validation
       case 'd':
+        if (std::string(optarg).empty()) {
+          std::cerr << "Error: prefix-delimiter cannot be empty" << std::endl;
+          return false;
+        }
         config.prefix_delimiter = optarg;
         break;

158-160: Validate type_filter values.

Users can specify any string for --type, but only "all", "strings", "hashes", "lists", "sets", and "zsets" are valid. An invalid value like --type=invalid will silently skip all analysis, producing empty results without any error message.

🔎 Proposed validation
       case 'y':
+        {
+          const std::vector<std::string> valid_types = {"all", "strings", "hashes", "lists", "sets", "zsets"};
+          if (std::find(valid_types.begin(), valid_types.end(), optarg) == valid_types.end()) {
+            std::cerr << "Error: Invalid type. Must be one of: all, strings, hashes, lists, sets, zsets" << std::endl;
+            return false;
+          }
+        }
         config.type_filter = optarg;
         break;

749-782: Check for output stream write failures.

The code writes to *out at lines 750-782 but never checks if the writes succeed. If the output file runs out of disk space or encounters an I/O error, writes will silently fail and the stream's error state will be set.

🔎 Proposed error checking

Add a check before returning:

   *out << "  " << entry.first << ": " << entry.second << " keys, "
        << mb_size << " MB total, "
        << avg_size << " bytes avg\n";
   }
   
+  if (out->fail()) {
+    std::cerr << "Error: Failed to write output" << std::endl;
+    return 1;
+  }
+  
   return 0;
 }

138-142: Redundant initialization of Config fields.

Lines 138-142 initialize config fields to the same values already specified as default member initializers in the Config struct at lines 109-113. This initialization is redundant.

🔎 Suggested simplification

Remove the redundant initialization:

   int opt;
   int option_index = 0;
   
-  config.min_size = 0;
-  config.top_n = -1;
-  config.prefix_stat = false;
-  config.prefix_delimiter = ":";
-  config.type_filter = "all";
-  
   while ((opt = getopt_long(argc, argv, "m:t:pd:y:o:h", long_options, &option_index)) != -1) {
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23668f3 and be1e4bd.

📒 Files selected for processing (1)
  • tools/bigkey_analyzer/bigkey_analyzer.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build_on_centos
  • GitHub Check: build_on_ubuntu
  • GitHub Check: build_on_macos
  • GitHub Check: Analyze (go)

Comment on lines +318 to +329

// Add results
for (const auto& entry : hash_info) {
int64_t size = std::get<0>(entry.second);
if (size >= config.min_size) {
std::string user_key = DecodeUserKey(entry.first);
std::string display_key = ReplaceAll(user_key, "\n", "\\n");
display_key = ReplaceAll(display_key, " ", "\\x20");
key_infos.emplace_back("hash", std::move(display_key), size, std::get<1>(entry.second));
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add iterator status checks after iteration.

AnalyzeHashes, AnalyzeSets, AnalyzeZsets, and AnalyzeLists don't check iterator status after completing their loops. If an iteration error occurs (e.g., I/O error reading from RocksDB), it will be silently ignored. AnalyzeStrings correctly checks status at lines 243-245.

🔎 Proposed fix

Add status checks after each iterator completes. For AnalyzeHashes:

   }
   
+  if (!meta_iter->status().ok()) {
+    std::cerr << "Error iterating hashes metadata: " << meta_iter->status().ToString() << std::endl;
+  }
+  
+  if (!data_iter->status().ok()) {
+    std::cerr << "Error iterating hashes data: " << data_iter->status().ToString() << std::endl;
+  }
+  
   // Add results
   for (const auto& entry : hash_info) {

Apply similar checks in:

  • AnalyzeSets after lines 393 (both iterators)
  • AnalyzeZsets after lines 470, 492 (all three iterators)
  • AnalyzeLists after lines 569 (both iterators)

Also applies to: 393-404, 492-503, 569-580

🤖 Prompt for AI Agents
In tools/bigkey_analyzer/bigkey_analyzer.cc around lines 318 to 329, the loop
adding hash results does not check the RocksDB iterator Status after iteration;
add a check similar to AnalyzeStrings (lines ~243-245) that calls
iterator->status() and handles non-OK by logging or returning the error before
proceeding. Apply the same pattern to the other analyzer methods: add
iterator->status() checks after the iteration loops in AnalyzeSets (after line
~393 for both iterators), AnalyzeZsets (after lines ~470 and ~492 for each
iterator used), and AnalyzeLists (after line ~569 for both iterators), ensuring
any non-OK status is propagated or logged consistently with existing error
handling.

Comment on lines +643 to +681
std::vector<rocksdb::ColumnFamilyHandle*> handles;
rocksdb::DB* db;
rocksdb::Status status = rocksdb::DB::OpenForReadOnly(db_options, db_path,
column_families, &handles, &db);

if (!status.ok()) {
std::cerr << "Error opening database at " << db_path << ": " << status.ToString() << std::endl;
return;
}

std::cout << "Analyzing database at " << db_path << std::endl;

// Analyze each type
if (config.type_filter == "all" || config.type_filter == "strings") {
AnalyzeStrings(db, handles[storage::kMetaCF], key_infos, config);
}

if (config.type_filter == "all" || config.type_filter == "hashes") {
AnalyzeHashes(db, handles[storage::kMetaCF], handles[storage::kHashesDataCF], key_infos, config);
}

if (config.type_filter == "all" || config.type_filter == "sets") {
AnalyzeSets(db, handles[storage::kMetaCF], handles[storage::kSetsDataCF], key_infos, config);
}

if (config.type_filter == "all" || config.type_filter == "zsets") {
AnalyzeZsets(db, handles[storage::kMetaCF], handles[storage::kZsetsDataCF],
handles[storage::kZsetsScoreCF], key_infos, config);
}

if (config.type_filter == "all" || config.type_filter == "lists") {
AnalyzeLists(db, handles[storage::kMetaCF], handles[storage::kListsDataCF], key_infos, config);
}

// Cleanup
for (auto handle : handles) {
delete handle;
}
delete db;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use RAII for resource cleanup to prevent leaks.

Lines 678-681 manually delete column family handles and the database. If any of the Analyze* functions (called at lines 657-674) throw an exception, these resources will leak. The try-catch blocks at lines 480-491 and 557-568 show that exceptions are possible.

🔎 Proposed RAII-based fix

Use std::unique_ptr with custom deleters:

   std::vector<rocksdb::ColumnFamilyHandle*> handles;
   rocksdb::DB* db;
   rocksdb::Status status = rocksdb::DB::OpenForReadOnly(db_options, db_path, 
                                                          column_families, &handles, &db);
   
   if (!status.ok()) {
     std::cerr << "Error opening database at " << db_path << ": " << status.ToString() << std::endl;
     return;
   }
   
+  // RAII wrapper for cleanup
+  auto cleanup = [&handles, &db]() {
+    for (auto handle : handles) {
+      delete handle;
+    }
+    delete db;
+  };
+  std::unique_ptr<void, decltype(cleanup)> guard(reinterpret_cast<void*>(1), cleanup);
+  
   std::cout << "Analyzing database at " << db_path << std::endl;
   
   // Analyze each type
   ...
   
-  // Cleanup
-  for (auto handle : handles) {
-    delete handle;
-  }
-  delete db;
+  // Cleanup happens automatically via guard destructor
 }

Alternatively, create a RAII wrapper class for RocksDB resources.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
tools/bigkey_analyzer/bigkey_analyzer.cc (5)

146-151: Add exception handling for numeric conversions.

The calls to std::stoll and std::stoi can throw std::invalid_argument or std::out_of_range exceptions when users provide invalid input. This issue was flagged in a previous review and should be addressed.


194-196: Handle GetCurrentTime failure to prevent incorrect TTL calculations.

The return value of GetCurrentTime is checked but discarded. If the call fails, curtime remains uninitialized. This was flagged in a previous review and applies to all Analyze* functions.


317-329: Add iterator status checks after iteration loops.

Both meta_iter and data_iter lack status checks after their loops complete. I/O errors during iteration would be silently ignored. This was flagged in a previous review.


643-651: Verify column family handles before access and use RAII for cleanup.

Two issues previously flagged:

  1. If the database is missing expected column families, handles.size() may be less than expected, causing out-of-bounds access.
  2. Manual cleanup at lines 678-681 will leak resources if any Analyze* function throws.

709-709: CURRENT is a file, not a directory.

DirectoryExists is used to check for the CURRENT file, but CURRENT is a RocksDB metadata file, not a directory. This check will always fail even when a valid database exists. This was flagged in a previous review.

Also applies to: 717-717, 730-730

🧹 Nitpick comments (3)
tools/bigkey_analyzer/bigkey_analyzer.cc (3)

5-31: Consider grouping includes by category for better organization.

Standard practice is to group includes: standard library first, then third-party libraries (RocksDB), then project headers. This improves readability and helps identify dependencies at a glance.

🔎 Suggested organization
-#include <iostream>
-#include <fstream>
-#include <string>
-#include <vector>
-#include <algorithm>
-#include <unordered_map>
-#include <memory>
-#include <ctime>
-#include <sys/stat.h>
-#include <getopt.h>
-
-#include "rocksdb/options.h"
-#include "rocksdb/db.h"
-#include "rocksdb/env.h"
-#include "rocksdb/iterator.h"
-#include "rocksdb/slice.h"
-#include "rocksdb/status.h"
-
-#include "storage/storage_define.h"
-#include "src/storage/src/base_value_format.h"
+// Standard library
+#include <algorithm>
+#include <ctime>
+#include <fstream>
+#include <iostream>
+#include <memory>
+#include <string>
+#include <unordered_map>
+#include <vector>
+
+// System headers
+#include <getopt.h>
+#include <sys/stat.h>
+
+// Third-party (RocksDB)
+#include "rocksdb/db.h"
+#include "rocksdb/env.h"
+#include "rocksdb/iterator.h"
+#include "rocksdb/options.h"
+#include "rocksdb/slice.h"
+#include "rocksdb/status.h"
+
+// Project headers
+#include "storage/storage_define.h"
+#include "src/storage/src/base_value_format.h"

488-491: Avoid catching all exceptions with catch (...).

Using catch (...) silently swallows all exceptions including system errors. Consider catching specific exception types or at minimum logging that an exception occurred for debugging purposes.

🔎 Proposed improvement
     try {
       storage::ParsedZSetsScoreKey parsed_key(score_key);
       std::string encoded_user_key = parsed_key.key().ToString();
       
       auto it = zset_info.find(encoded_user_key);
       if (it != zset_info.end()) {
         std::get<0>(it->second) += score_key.size() + score_value.size();
       }
-    } catch (...) {
-      // Skip malformed keys
+    } catch (const std::exception& e) {
+      // Skip malformed keys - could add debug logging here
       continue;
     }

Same applies to the catch block in AnalyzeLists (lines 565-568).


684-684: Missing space before opening brace in main signature.

Minor style: int main(int argc, char *argv[]) should have consistent spacing. Consider int main(int argc, char* argv[]) for consistency with other pointer declarations in the codebase.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be1e4bd and fdc4e5d.

📒 Files selected for processing (1)
  • tools/bigkey_analyzer/bigkey_analyzer.cc
🧰 Additional context used
🧬 Code graph analysis (1)
tools/bigkey_analyzer/bigkey_analyzer.cc (1)
src/storage/include/storage/storage_define.h (1)
  • SeekUserkeyDelim (122-132)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build_on_ubuntu
  • GitHub Check: Analyze (go)
  • GitHub Check: build_on_macos
  • GitHub Check: build_on_centos

Comment on lines +774 to +776
// Output summary
*out << "\n===== Summary =====\n";
*out << "Total keys analyzed: " << key_infos.size() << "\n";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Summary may be misleading after top_n filtering.

After resize(config.top_n) at line 758, the summary reports "Total keys analyzed" based on the truncated vector. If a user has 10,000 keys but uses --top=100, the summary will say "Total keys analyzed: 100", which doesn't reflect the actual number of keys scanned.

🔎 Proposed fix

Track the original count before truncation:

+  size_t total_keys_found = key_infos.size();
+  
   // Limit to top N if requested
   if (config.top_n > 0 && config.top_n < static_cast<int>(key_infos.size())) {
     key_infos.resize(config.top_n);
   }
   
   // ...
   
   // Output summary
   *out << "\n===== Summary =====\n";
-  *out << "Total keys analyzed: " << key_infos.size() << "\n";
+  *out << "Total keys found: " << total_keys_found << "\n";
+  if (config.top_n > 0) {
+    *out << "Showing top " << key_infos.size() << " keys\n";
+  }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tools/bigkey_analyzer/bigkey_analyzer.cc around lines 774 to 776, the summary
prints "Total keys analyzed" using key_infos.size() after the vector was
truncated by resize(config.top_n), which misreports the actual number scanned;
record the original count (e.g. save key_infos.size() to a variable like
total_keys_before_truncation) immediately before the resize call and use that
saved value when printing the total scanned in the summary, while still
reporting the truncated/displayed count (or indicating "showing top N") to avoid
confusion.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (11)
tools/bigkey_analyzer/README.md (2)

33-43: Add language identifier to fenced code block.

The usage text block should specify text as the language identifier for proper syntax highlighting.

🔎 Proposed fix
-```
+```text
 Usage: bigkey_analyzer [OPTIONS] <db_path>
 Options:

97-118: Add language identifier to fenced code block.

The example output block should specify text as the language identifier for proper syntax highlighting.

🔎 Proposed fix
-```
+```text
 ===== Big Key Analysis =====
 Type    Size    Key     TTL
tools/bigkey_analyzer/bigkey_analyzer.cc (9)

50-54: DecodeUserKey lacks error handling for malformed input.

The storage::DecodeUserKey function can produce incorrect output for malformed encoded keys, but this wrapper ignores potential issues and returns the result to all call sites.


152-157: Add exception handling for numeric conversions.

The calls to std::stoll and std::stoi can throw exceptions for invalid input (e.g., --min-size=abc), causing ungraceful termination.

🔎 Proposed fix with exception handling
     switch (opt) {
       case 'm':
-        config.min_size = std::stoll(optarg);
+        try {
+          config.min_size = std::stoll(optarg);
+        } catch (const std::exception& e) {
+          std::cerr << "Error: Invalid value for --min-size: " << optarg << std::endl;
+          return false;
+        }
         break;
       case 't':
-        config.top_n = std::stoi(optarg);
+        try {
+          config.top_n = std::stoi(optarg);
+        } catch (const std::exception& e) {
+          std::cerr << "Error: Invalid value for --top: " << optarg << std::endl;
+          return false;
+        }
         break;

201-203: Handle GetCurrentTime failure to prevent incorrect TTL calculations.

If GetCurrentTime fails, curtime remains uninitialized, leading to incorrect TTL calculations. This pattern appears in all analyze functions.

🔎 Proposed fix
   int64_t curtime;
-  db->GetEnv()->GetCurrentTime(&curtime).ok();
+  if (!db->GetEnv()->GetCurrentTime(&curtime).ok()) {
+    std::cerr << "Error: Failed to get current time" << std::endl;
+    return;
+  }
   curtime *= 1000;

Apply to AnalyzeHashes (line 263), AnalyzeSets (line 347), AnalyzeZsets (line 424), and AnalyzeLists (line 523).


314-324: Strengthen validation for SeekUserkeyDelim result and add version checking.

Two issues: (1) SeekUserkeyDelim may return an invalid pointer for malformed keys, causing undefined behavior in pointer arithmetic. (2) Data from stale key versions is counted without version validation, inflating size calculations.

🔎 Proposed fix
     const char* ptr = storage::SeekUserkeyDelim(data_key.data(), data_key.size());
+    if (ptr < data_key.data() || ptr > data_key.data() + data_key.size()) {
+      continue;
+    }
     size_t user_key_len = ptr - data_key.data();
     
     if (user_key_len == 0 || user_key_len > data_key.size()) continue;
     
     std::string encoded_user_key(data_key.data(), user_key_len);
     
     auto it = hash_info.find(encoded_user_key);
     if (it != hash_info.end()) {
+      // Validate version to exclude stale data
+      if (data_key.size() > user_key_len + sizeof(uint64_t)) {
+        uint64_t data_version;
+        memcpy(&data_version, data_key.data() + user_key_len, sizeof(uint64_t));
+        if (data_version != std::get<2>(it->second)) {
+          continue;
+        }
+      }
       std::get<0>(it->second) += data_key.size() + data_value.size();
     }

Apply similar fixes to AnalyzeSets (line 391), AnalyzeZsets (lines 469, 489), and AnalyzeLists (line 568).


326-337: Add iterator status checks after iteration.

Missing iterator status checks after data iteration. If an I/O error occurs during iteration, it will be silently ignored.

🔎 Proposed fix
   }
   
+  if (!meta_iter->status().ok()) {
+    std::cerr << "Error iterating hashes metadata: " << meta_iter->status().ToString() << std::endl;
+  }
+  
+  if (!data_iter->status().ok()) {
+    std::cerr << "Error iterating hashes data: " << data_iter->status().ToString() << std::endl;
+  }
+  
   // Add results
   for (const auto& entry : hash_info) {

Apply to AnalyzeSets (after line 402), AnalyzeZsets (after lines 480, 502), and AnalyzeLists (after line 580).


655-663: Verify column family handle count before access.

If the database is missing expected column families, OpenForReadOnly may succeed but return fewer handles, causing out-of-bounds access at handles[storage::kMetaCF].

🔎 Proposed fix
   rocksdb::Status status = rocksdb::DB::OpenForReadOnly(db_options, db_path, 
                                                          column_families, &handles, &db);
   
   if (!status.ok()) {
     std::cerr << "Error opening database at " << db_path << ": " << status.ToString() << std::endl;
     return;
   }
+  
+  if (handles.size() != cf_names.size()) {
+    std::cerr << "Error: Expected " << cf_names.size() << " column families but got " 
+              << handles.size() << std::endl;
+    for (auto handle : handles) {
+      delete handle;
+    }
+    delete db;
+    return;
+  }

689-693: Use RAII for resource cleanup to prevent leaks.

Manual cleanup at lines 690-693 will leak resources if any Analyze* function throws an exception (evidenced by try-catch blocks at lines 490-501 and 568-579).

🔎 RAII-based fix
+  // RAII wrapper for cleanup
+  auto cleanup = [&handles, &db]() {
+    for (auto handle : handles) {
+      delete handle;
+    }
+    delete db;
+  };
+  std::unique_ptr<void, decltype(cleanup)> guard(reinterpret_cast<void*>(1), cleanup);
+  
   std::cout << "Analyzing database at " << db_path << std::endl;
   
   // Analyze each type
   ...
   
-  // Cleanup
-  for (auto handle : handles) {
-    delete handle;
-  }
-  delete db;

721-721: CURRENT is a file, not a directory.

Lines 721, 737, 756, and 770 use DirectoryExists to check for the CURRENT file, which is a RocksDB metadata file, not a directory. These checks will always fail even for valid databases.

🔎 Proposed fix

Add a FileExists helper:

bool FileExists(const std::string& path) {
  struct stat st;
  return stat(path.c_str(), &st) == 0 && S_ISREG(st.st_mode);
}

Then replace all four checks:

-  if (DirectoryExists(test_path + "/CURRENT")) {
+  if (FileExists(test_path + "/CURRENT")) {

-          if (DirectoryExists(partition_path) && DirectoryExists(partition_path + "/CURRENT")) {
+          if (DirectoryExists(partition_path) && FileExists(partition_path + "/CURRENT")) {

-        if (DirectoryExists(db_inst_path) && DirectoryExists(db_inst_path + "/CURRENT")) {
+        if (DirectoryExists(db_inst_path) && FileExists(db_inst_path + "/CURRENT")) {

-        if (DirectoryExists(db_inst_path) && DirectoryExists(db_inst_path + "/CURRENT")) {
+        if (DirectoryExists(db_inst_path) && FileExists(db_inst_path + "/CURRENT")) {

Also applies to: 737-737, 756-756, 770-770


819-821: Summary is misleading after top_n filtering.

After resize(config.top_n) at line 801, the summary reports "Total keys analyzed" based on the truncated vector. If 10,000 keys are scanned but --top=100 is used, the summary incorrectly says "Total keys analyzed: 100".

🔎 Proposed fix
+  size_t total_keys_found = key_infos.size();
+  
   // Limit to top N if requested
   if (config.top_n > 0 && config.top_n < static_cast<int>(key_infos.size())) {
     key_infos.resize(config.top_n);
   }
   
   // ...
   
   // Output summary
   *out << "\n===== Summary =====\n";
-  *out << "Total keys analyzed: " << key_infos.size() << "\n";
+  *out << "Total keys found: " << total_keys_found << "\n";
+  if (config.top_n > 0) {
+    *out << "Showing top " << key_infos.size() << " keys\n";
+  }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fdc4e5d and 33ef7e2.

📒 Files selected for processing (2)
  • tools/bigkey_analyzer/README.md
  • tools/bigkey_analyzer/bigkey_analyzer.cc
🧰 Additional context used
🧬 Code graph analysis (1)
tools/bigkey_analyzer/bigkey_analyzer.cc (1)
src/storage/include/storage/storage_define.h (1)
  • SeekUserkeyDelim (122-132)
🪛 markdownlint-cli2 (0.18.1)
tools/bigkey_analyzer/README.md

33-33: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


97-97: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Analyze (go)
  • GitHub Check: build_on_centos
  • GitHub Check: build_on_macos
  • GitHub Check: build_on_ubuntu

Comment on lines +164 to +166
case 'y':
config.type_filter = optarg;
break;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate type_filter to prevent silent analysis failures.

The type_filter parameter is not validated against allowed values. If a user provides an invalid type like --type=foo, the tool will silently skip all analysis without warning, producing empty results that appear successful.

🔎 Proposed fix with validation
       case 'y':
         config.type_filter = optarg;
+        if (config.type_filter != "all" && config.type_filter != "strings" &&
+            config.type_filter != "hashes" && config.type_filter != "lists" &&
+            config.type_filter != "sets" && config.type_filter != "zsets") {
+          std::cerr << "Error: Invalid type '" << optarg 
+                    << "'. Must be one of: strings, hashes, lists, sets, zsets, all" << std::endl;
+          return false;
+        }
         break;
🤖 Prompt for AI Agents
In tools/bigkey_analyzer/bigkey_analyzer.cc around lines 164-166, validate the
value passed to --type (config.type_filter) against the tool's allowed type
names before assigning it; if optarg is not one of the permitted types, print a
clear error to stderr (or via the existing logger) and exit non-zero (or return
an error) instead of silently accepting it. Implement this by comparing optarg
to the canonical list of supported types used elsewhere in the code (e.g., the
set of types the analyzer recognizes), only assign config.type_filter when
valid, and surface a user-facing message listing valid values when invalid input
is detected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

📒 Documentation Improvements or additions to documentation ✏️ Feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants