feat(tui): auto-discover .codewhale/rules/ and .claude/rules/ directories as project context#3892
Conversation
|
Thanks @yekern for taking the time to contribute. This repository is observing a maintainer-managed PR intake gate in dry-run mode, so this pull request is staying open. This note helps maintainers prepare the allowlist before any enforcement is considered. Please read |
LeoLin990405
left a comment
There was a problem hiding this comment.
Thanks for the ping — really nice work, and cleanly scoped exactly as discussed: no #417 relaxation, merge_project_config untouched, and keeping rules_block separate from instructions so has_instructions() isn't poisoned is the right call for parent-directory AGENTS.md traversal in mono-repos. The cache-invalidation catch (adding the rules *.md to project_context_cache_candidate_paths) is a good find, the 50-file cap + deterministic filename order are sensible, and reusing load_context_file gets you the size check + per-file symlink safety for free.
One security gap worth closing before this lands, since it's exactly the "escape the workspace subtree" class Hunter flagged:
A symlinked rules directory escapes the workspace. rules_rejects_symlinked_files covers a symlinked .md file, but nothing checks whether .codewhale/rules / .claude/rules is itself a symlink. load_rules_from_dir (and the cache-path enumerator) call fs::read_dir(workspace.join(dir)) directly, which follows a directory symlink. The files behind it are real, so the per-file is_symlink() check in load_context_file passes them through:
$ ln -s /some/outside/dir .codewhale/rules # real .md files live in /some/outside/dir
# symlink_metadata(".codewhale/rules/secret.md") → is_symlink=false, is_file=true
# → load_context_file reads /some/outside/dir/secret.md and injects it into project context
Confirmed locally: a repo shipping .codewhale/rules -> /some/outside/dir gets that directory's *.md read into the prompt at load_project_context time — before any command approval, including in read-only/plan mode. It's .md-only, so it's information disclosure rather than arbitrary read, but it still reads files the repo doesn't own, which is the #417 concern.
Suggested guard — refuse a symlinked rules dir (mirrors the existing file-level Refusing symlinked context file precedent):
// A repo could point .codewhale/rules at a path outside the workspace;
// refuse a symlinked rules directory so real .md files behind it aren't read.
if fs::symlink_metadata(&rules_dir)
.map(|m| m.file_type().is_symlink())
.unwrap_or(false)
{
tracing::warn!(target: "project_context", dir = %rules_dir.display(), "Refusing symlinked rules directory");
return entries;
}Two follow-ups: the same guard needs to go in project_context_cache_candidate_paths (it re-scans the directory independently), and a rules_rejects_symlinked_directory test would lock it in. Since the directory scan is now duplicated in both places, it might be worth a small shared rules_md_files(workspace, dir) helper so the symlink guard can't drift between the load path and the cache path.
Everything else looks solid. 🐳
|
|
@LeoLin990405 Good catch — just pushed a commit adding the symlink-directory guard. Two places patched: Appreciate the thorough review. 🐳 |
Review — solid feature; needs a rebase onto the merged v0.8.67 context workThis is a nice addition and it directly addresses #3867 (project-scope instructions being overly denied). The design is thoughtful and the security model is handled well:
Blocker: rebase requiredThis now conflicts with The interaction is only textual, not behavioral: this feature reads rules as project context, which is orthogonal to the repo-law write enforcement that also lives in One minor suggestion (non-blocking)Per-file content is capped ( Net: sound and safe design; rebase + full CI, optional total-size cap. Happy to help with the rebase against the new (Reviewed against the current diff; not a merge/approve — for @Hmbown's decision.) |
…ries as project context Add rules-directory auto-discovery as solution D from Hmbown#3867. - Scans .codewhale/rules/ (native) and .claude/rules/ (Claude compat) for *.md files - Loads in filename order, wraps each in <project_rule source="…"> elements - Separates rules into rules_block field to avoid blocking parent AGENTS.md traversal - Reuses load_context_file() for size checking + symlink safety (MAX_CONTEXT_SIZE 100KB) - Caps at MAX_RULES_FILES=50 per directory to prevent abuse - Adds rules files to project_context_cache_candidate_paths for proper cache invalidation - Updates /context report to surface rules Files: project_context.rs (+~190), context_report.rs (+18), project_context_cache.rs (+28) Tests: 11 new (9 rules + 2 cache), 67 passed, clippy clean
A symlinked rules directory (e.g. .codewhale/rules -> /outside) would allow real .md files behind it to pass per-file symlink checks and be read from outside the workspace subtree — same escape class as Hmbown#417. Adds fs::symlink_metadata guard in load_rules_from_dir() and project_context_cache_candidate_paths() to skip symlinked directories. New test rules_rejects_symlinked_directory locks in the guard. Reported-by: @LeoLin990405
50 files × 100KB could reach ~5MB. Caps cumulative rules_block at MAX_RULES_BLOCK_BYTES=500KB with truncation marker to prevent a large rules directory from dominating the context window. Suggested-by: review on Hmbown#3892
fbd881a to
20926cf
Compare
|
@Hmbown Rebased onto latest main (v0.8.67 constitution work from #3861). Two review-driven additions:
All green: 62 tests pass (including 10 new rules tests + 2 cache tests), fmt clean, clippy clean. Conflict was textual only — the |
PR: feat(tui) — auto-discover
.codewhale/rules/and.claude/rules/directories as project contextCloses #3867
Summary
Add rules-directory auto-discovery to
load_project_context(): on every session start,CodeWhale automatically scans
.codewhale/rules/(native) and.claude/rules/(Claude compat)for
.mdfiles, loads them in filename order, and appends them to the project-context blockinjected into the system prompt. Each rule is wrapped in a
<project_rule source="…">element.This completes solution D from the design anchor issue #3867 — the same trust model as
AGENTS.md(workspace-contained content only, no absolute-path escape), with no #417project-config relaxation required.
Motivation
Before this PR, CodeWhale's instruction system was nearly unusable in multi-project workflows:
instructionsconfig key blocked at project scope since v0.8.8 (PRIOR: Ignore dangerous project-level config keys #417) — users couldonly list rule files in
~/.codewhale/config.toml, making it painful to maintainper-project rules across many repositories.
.claude/rules/auto-loads all.mdfiles; CodeWhale had no equivalent and no mechanism to load multiple rule fileswithout manual config.
instructions_paths(), so eveninstructions = [".claude/rules/*.md"]was impossible.
The recommended path from the #3867 design discussion was D first — rules-directory
auto-discovery sits in the same trust class as
AGENTS.md, needs no #417 relaxation, anddelivers the majority of multi-project pain relief on its own. This PR implements that slice.
Design decisions
rules_blockvs mixing intoinstructionsRules are stored in a separate
rules_block: Option<String>field onProjectContext,not mixed into
instructions. This is essential for mono-repo support:has_instructions()controls whether the parent-directory traversal searches for a rootAGENTS.md. If rules alone setinstructions, they would block parent discovery.rules_block,has_instructions()stays unchanged (only reflectsmain instructions), and parent traversal works correctly.
as_system_block()appendsrules_blockafterinstructionsat render time, so bothare present in the final system prompt.
Security model
Same trust class as
AGENTS.md:.codewhale/rules/or.claude/rules/withinthe project. No absolute-path escape.
load_context_file()(shared with AGENTS.md) rejects symlinked files,matching the existing precedent in
read_project_config_file.MAX_RULES_FILES) to prevent abuse.MAX_CONTEXT_SIZE) inherited from the context loader.No #417 relaxation
merge_project_config's rejection of project-scopeinstructionsis left unchanged.Scheme D is orthogonal to #417 — it doesn't touch the config key at all.
Changes
crates/tui/src/project_context.rs(+~190 lines)New constants:
RULES_DIRS = [".codewhale/rules", ".claude/rules"]— directories scanned in orderMAX_RULES_FILES = 50— per-directory file capNew field on
ProjectContext:rules_block: Option<String>— holds the assembled rules XML, separate frominstructionsNew function
load_rules_from_dir():*.mdfilesload_context_file()for size checking + symlink safety + empty-file rejectionVec<(PathBuf, String)>— silently returns empty on missing/unreadable directoriesModified
load_project_context():PROJECT_CONTEXT_FILES(AGENTS.md etc.), iteratesRULES_DIRSand callsload_rules_from_dir()<project_rule source="…">…</project_rule>ctx.rules_block(notctx.instructions, preserving parent traversal)Modified
as_system_block():rules_blockinside the project-context block when instructions existrules_blockstandalone when no main instructions are presentrules_blockafter constitution when constitution exists but instructions don'tModified
project_context_cache_candidate_paths():RULES_DIRSfor*.mdfiles and adds them to the cache-key candidate listadding/removing rule files all produce a different cache key)
9 new tests:
rules_from_codewhale_dir_are_loaded_as_project_context<project_rule>wrapperrules_are_loaded_in_filename_orderrules_from_claude_dir_are_compat_loaded.claude/rules/compatibilityrules_directory_missing_does_not_crashrules_coexist_with_agents_mdnon_md_files_in_rules_dir_are_ignored*.mdfiles are loadedrules_cap_truncates_excess_filesrules_rejects_symlinked_filesrules_from_both_dirs_are_loaded_togethercrates/tui/src/context_report.rs(+18 lines)/context reportnow includesrules_blockcontent when rules are presentcrates/tui/src/project_context_cache.rs(+28 lines, 2 tests)signature_changes_when_rules_file_changes— verifies content change triggers cache invalidationsignature_changes_when_rules_file_is_added_or_removed— verifies file addition/removal triggers invalidationVerification
cargo fmt --all -- --checkcargo clippy -p codewhale-tui(our files only)cargo test -p codewhale-tui --bin codewhale-tui -- project_contextcargo test -p codewhale-tui --bin codewhale-tui -- project_context_cachecargo test -p codewhale-tui --bin codewhale-tui -- context_reportSystem prompt structure (with rules)
Audit summary
A comprehensive cross-system audit (2 rounds, 5 dimensions) was performed to ensure no
regressions or unexpected interactions:
build_system_promptall go throughas_system_block().agenttool ✔️ inherits rules via fork_context. Background/agentpath ❌ uses static prompt — same pre-existing limitation as AGENTS.md.rules_blockseparated frominstructions—has_instructions()unchanged.merge_project_config'sinstructionsrejection untouched.What this PR does NOT do (deferred to future milestones)
instructions_paths()(scheme C)instructionsrelaxation (scheme B)pathsmatching (scheme E)instructions(scheme A)These are tracked in #3867 as separate workstreams.
Migration path
.codewhale/rules/(or.claude/rules/) and drop.mdfiles.No config changes needed — rules are auto-discovered on next session start.
.claude/rules/users: rules are picked up automatically — zero migration cost.instructionsusers: both channels are additive (project rules + globalinstructions coexist in the system prompt), so no conflict.
PR:feat(tui) — 自动发现
.codewhale/rules/和.claude/rules/目录作为项目上下文Closes #3867
概述
为
load_project_context()新增 rules 目录自动发现:每次会话启动时,CodeWhale自动扫描
.codewhale/rules/(原生)和.claude/rules/(Claude 兼容)目录下的.md文件,按文件名排序加载,追加到注入 system prompt 的项目上下文块中。每条规则包裹在
<project_rule source="…">元素中。这是设计锚点 issue #3867 中方案 D 的实现——与
AGENTS.md相同的安全模型(仅限工作区内容,无绝对路径逃逸),不需要 relax #417 项目级配置限制。
动机
此 PR 之前,CodeWhale 在多项目场景下的规则系统几乎不可用:
instructions配置项被项目级禁止(自 v0.8.8 PRIOR: Ignore dangerous project-level config keys #417)——用户只能在~/.codewhale/config.toml中列举规则文件,跨多个仓库维护极其痛苦。.claude/rules/自动加载所有.md文件;CodeWhale 没有对应机制,且无法批量加载多文件规则。
instructions_paths()不支持 glob,即使写instructions = [".claude/rules/*.md"]也是无效的。
#3867 设计讨论的推荐路径是 D 优先——rules 目录自动发现与
AGENTS.md同安全等级,无需改动 #417,且能独立解决多项目痛点的大部分。本 PR 实现该方案。
设计决策
rules_block分离 vs 混入instructionsRules 存储在
ProjectContext的独立字段rules_block: Option<String>中,不混入instructions。这对 mono-repo 场景至关重要:has_instructions()控制是否向上搜索父目录的AGENTS.md。若 rules 单独设置了instructions,会阻止父目录发现。rules_block中,has_instructions()保持不变(仅反映主指令),父目录遍历正常工作。
as_system_block()在渲染时将rules_block追在instructions之后,两者都出现在最终 system prompt 中。
安全模型
与
AGENTS.md同等级:.codewhale/rules/或.claude/rules/。无绝对路径逃逸。
load_context_file()(与 AGENTS.md 共享)拒绝软链接文件,与read_project_config_file中的现有先例一致。MAX_RULES_FILES)防止滥用。MAX_CONTEXT_SIZE)继承自上下文加载器。不触碰 #417
merge_project_config对项目级instructions的拒绝保持原样。方案 D 与 #417完全正交——不涉及配置项。
改动
crates/tui/src/project_context.rs(+~190 行)新增常量:
RULES_DIRS = [".codewhale/rules", ".claude/rules"]— 按顺序扫描的目录MAX_RULES_FILES = 50— 每目录文件上限ProjectContext新增字段:rules_block: Option<String>— 存放组装好的 rules XML,与instructions分离新增函数
load_rules_from_dir():*.md文件load_context_file()做大小检查 + 软链接安全 + 空文件拒绝Vec<(PathBuf, String)>— 目录缺失或不可读时静默返回空 vector修改
load_project_context():PROJECT_CONTEXT_FILES(AGENTS.md 等)后,遍历RULES_DIRS调用load_rules_from_dir()<project_rule source="…">…</project_rule>中ctx.rules_block(而非ctx.instructions,保留父目录遍历)修改
as_system_block():rules_block追在项目上下文块中rules_blockrules_block修改
project_context_cache_candidate_paths():RULES_DIRS中的*.md文件,加入缓存 key 候选列表9 个新测试:
rules_from_codewhale_dir_are_loaded_as_project_context<project_rule>包裹rules_are_loaded_in_filename_orderrules_from_claude_dir_are_compat_loaded.claude/rules/兼容rules_directory_missing_does_not_crashrules_coexist_with_agents_mdnon_md_files_in_rules_dir_are_ignored*.mdrules_cap_truncates_excess_filesrules_rejects_symlinked_filesrules_from_both_dirs_are_loaded_togethercrates/tui/src/context_report.rs(+18 行)/context report现在在 rules 存在时包含rules_block内容crates/tui/src/project_context_cache.rs(+28 行,2 个测试)signature_changes_when_rules_file_changes— 验证内容变更触发缓存失效signature_changes_when_rules_file_is_added_or_removed— 验证文件增删触发失效验证
cargo fmt --all -- --checkcargo clippy -p codewhale-tui(仅本次改动文件)cargo test -p codewhale-tui --bin codewhale-tui -- project_contextcargo test -p codewhale-tui --bin codewhale-tui -- project_context_cachecargo test -p codewhale-tui --bin codewhale-tui -- context_reportSystem prompt 结构(含 rules)
审计摘要
进行了全面的跨系统审计(2 轮、5 个维度),确保无回归或意外交互:
build_system_prompt均经过as_system_block()。agent工具 ✔️ 通过 fork_context 继承 rules。后台/agent路径 ❌ 使用静态 prompt——与 AGENTS.md 相同的预存限制。rules_block与instructions分离——has_instructions()不变。merge_project_config的instructions拒绝保持不变。本 PR 不包含的内容(推迟到后续 milestone)
instructions_paths()(方案 C)instructions(方案 B)paths匹配(方案 E)instructions(方案 A)以上在 #3867 中作为独立工作流跟踪。
迁移路径
.codewhale/rules/(或.claude/rules/)并放入.md文件。无需配置变更——下次会话启动时自动发现 rules。
.claude/rules/用户:rules 直接生效——零迁移成本。instructions用户:两个通道是叠加关系(项目 rules + 全局 instructions共存于 system prompt),无冲突。