Skip to content

feat: support reading external files as Code Review rules via use_file_path#87

Open
zephyrq-z wants to merge 4 commits into
alibaba:mainfrom
zephyrq-z:feature/rule
Open

feat: support reading external files as Code Review rules via use_file_path#87
zephyrq-z wants to merge 4 commits into
alibaba:mainfrom
zephyrq-z:feature/rule

Conversation

@zephyrq-z

Copy link
Copy Markdown

Description

Closes #67

📝 Background & Motivation

In practice, inlining complex Code Review rules directly into rule.json makes the configuration file bloated and hard to maintain. Issue #67 requested the ability to extract rules into separate .md or .txt documents and reference them within rule.json.

🚀 Core Changes

To keep the configuration concise, the initially discussed standalone rule_file field was discarded in favor of a more intuitive use_file_path boolean toggle:

  • Field Reuse: When use_file_path: true, the rule field is treated as a file path. The program reads the file's content and directly overwrites the rule field.
  • Backward Compatibility: When the field is omitted or set to false, the rule field continues to act as a standard inline string rule, fully compatible with legacy configurations.

🛡️ Security & Robustness

To prevent misconfigurations or malicious exploitation, strict defense mechanisms are implemented in the resolveRuleFiles function:

  1. Directory Traversal Protection: Enforces absolute path validation to prevent reading sensitive files outside the project's .opencodereview directory via ../ (e.g., ../../etc/passwd).
  2. Extension Restriction: Only allows reading .md and .txt rule files, preventing the accidental loading of executable scripts or other sensitive formats.
  3. File Size Limit: Caps individual files at 100KB to prevent bloating the LLM's Context Window.
  4. Graceful Degradation: If a file is missing, lacks permissions, or has an unsupported format, the program does not panic. Instead, it prints a clear [WARN] log and retains the original rule string (falling back to system default rules).

💻 Configuration Example

In .opencodereview/rule.json:

{
  "rules": [
    {
      "path": "**/*.py",
      "rule": "docs/python_security_rules.md",
      "use_file_path": true
    },
    {
      "path": "**/*.java",
      "rule": "【Inline Rule】Please check for NullPointerException risks."
    }
  ]
}

🧪 Testing & Verification

All changes include comprehensive test coverage:

  • End-to-End Automated Tests:
    • reproduce_test.sh: Verifies via a local Mock LLM Server that external rule file contents are successfully injected into the LLM Context.
    • observable_test.sh: Intuitively validates text changes before and after rule parsing using the native ocr rules check command.
    • security_test.sh: Conducts security boundary testing specifically targeting malicious paths like ../../etc/passwd.

zephyrq-z added 2 commits June 8, 2026 15:45
- Added 'rule_file' field to ProjectRuleEntry in rule.json
- Implemented resolveRuleFiles to read, validate, and merge external rule files
- Security: prevents path traversal (../) outside the base directory
- Limits supported extensions to .md and .txt, max size 100KB
- Merges contents if both 'rule' and 'rule_file' are provided
- Added unit tests for security, missing files, large files, etc.
- Updated README.md and README.zh-CN.md

Resolves alibaba#67
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment thread README.zh-CN.md Outdated
"path": "**/*mapper*.xml",
"rule": "检查 SQL 注入风险、参数错误和缺少闭合标签"
"rule": "docs/sql-rules.md",
"use_file_path": true

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉需要放在顶层,不然每个子项都需要配置一遍。

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 OpenCodeReview found 2 issue(s) in this PR.

  • ✅ 2 posted as inline comment(s)
  • 📝 0 posted as summary (missing line info)

Comment thread internal/config/rules/system_rules.go Outdated
Comment on lines +335 to +336
// Security check: prevent directory traversal
if !strings.HasPrefix(absFile+string(filepath.Separator), absBase) && absFile != strings.TrimSuffix(absBase, string(filepath.Separator)) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: Symlink-based directory traversal bypass. filepath.Abs only performs lexical path cleaning — it does NOT resolve symlinks. An attacker (or a malicious repo) could place a symlink inside .opencodereview/ that points to an arbitrary file outside the base directory. The prefix check on the string path would pass because the symlink itself lives within the base dir, but os.ReadFile follows the symlink and reads the target file.

To fix this, resolve symlinks before performing the prefix check:

resolvedFile, err := filepath.EvalSymlinks(absFile)
if err != nil {
    fmt.Fprintf(os.Stderr, "[WARN] Failed to resolve symlinks for rule file %s: %v\n", filePath, err)
    continue
}

Then use resolvedFile for both the security check and subsequent reads. You should also resolve absBase with EvalSymlinks to handle the case where the base directory itself contains symlinks.

Comment thread internal/config/rules/system_rules.go Outdated
Comment on lines +324 to +326
if !entry.UseFilePath || entry.Rule == "" {
continue
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent skip without warning when UseFilePath is true but Rule is empty. A user who sets "use_file_path": true but forgets to specify the "rule" field will get no feedback at all, making configuration debugging difficult. Consider adding a warning here similar to the other error paths.

Suggestion:

Suggested change
if !entry.UseFilePath || entry.Rule == "" {
continue
}
if !entry.UseFilePath {
continue
}
if entry.Rule == "" {
fmt.Fprintf(os.Stderr, "[WARN] Rule entry has use_file_path=true but empty rule path, skipping\n")
continue
}

…erability

- Move use_file_path from per-entry to top-level ProjectRule field
  (simplifies config: no need to repeat flag for every rule entry)
- Add filepath.EvalSymlinks to prevent symlink-based directory traversal
  (resolves symlinks on both file path and base directory for consistent
  prefix matching across platforms like macOS /var -> /private/var)
- Add warning when use_file_path=true but rule path is empty
- Update tests to use new top-level UseFilePath structure
- Update README.md and README.zh-CN.md with new config format
Comment thread README.zh-CN.md Outdated

```json
{
"use_file_path": true,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉这个说明会误导用户,同时存在 path 和字符串应该会报错吧,use_file_path开关应该是控制用户使用其中一种。

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另外其他语言的 readme 也需要补充下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants