Release v1.0.154 — markdown chunking, stale-path relocation, auto-prune, 15 languages#97
Conversation
fix: TUI indexing status + SCIP LMDB MDB_MAP_FULL fix (v1.0.128)
* [worker] cleanup: AGENTS.md — 73% reduction, removed stale test report and duplicate bug details * docs: update CHANGELOG — v1.0.132 consolidated release notes (v1.0.97...v1.0.132) --------- Co-authored-by: flupkede <flupkede@users.noreply.github.com>
* fix(mcp): ignore project/group params in local/stdio mode instead of erroring When running MCP in local mode (no serve_state), project/group routing is meaningless because only one DB is available. Log a warning and fall back to local DB instead of returning an error. * fix(qc): define YELLOW color in bash QC script
…al duplicate (v1.0.137) (#76) * fix: create DB directory before acquiring writer lock (serve auto-register) When `serve` is running and `codesearch index` is run for a repo not yet known to it, auto-register (POST /repos) failed with a misleading "Database is locked by another process" 500: SharedStores::new() acquired the writer lock before the .codesearch.db directory existed, so opening .writer.lock failed with "path not found". This rolled back the repos.json registration and made the CLI fall back to a local duplicate index instead of delegating to serve. - acquire_writer_lock / SharedStores::new now create the DB directory first; genuine I/O errors surface distinctly instead of as a lock conflict. - Serve config writes route through ServeState::persist_config() (honors the config path override) — production behavior unchanged, register/remove path now hermetically testable. - Regression guards exercise the brand-new-repo create/register path with the DB directory genuinely absent (verified to fail against the pre-fix code). - CHANGELOG: 1.0.136. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: never silently create a local duplicate index when serve is busy The CLI probes serve's /health before delegating `index`/`index add`. Any health failure — including a *timeout* while serve is warming up its repos at startup — was classified as "serve not running", so the CLI silently created a local index. That local index is a duplicate serve does not manage and can cause LMDB file-lock conflicts (and the repo never gets registered with serve). New behavior via probe_serve_health(): - Responsive -> delegate as before. - Connection refused / cannot connect -> serve not running; index locally. Detected immediately (no timeout elapses, no retries), so the common "no serve -> local" path is NOT slowed down. - Listening but unresponsive (timeout, retried briefly) -> serve is up but busy. The CLI now REFUSES to create a local duplicate and tells the user to retry shortly or stop serve first. The fallback is never silent anymore. Delegation errors are now typed (DelegateError: ServeDown / ServeUnresponsive / Failed) instead of string-matched. Applies to `index` and `index add` (the index-creating paths); `index rm` is unchanged. Tests: probe classification guards (responsive -> Up; listening-but-slow -> Unresponsive). Rolls into the 1.0.137 release together with the writer-lock fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: flupkede <flupkede@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Brings the two files that drifted on master during the v1.0.137 release back to develop: the updated protect-master.yml (allows release/* branches) and the CHANGELOG [1.0.135] entry. After this, develop and master trees are identical. Co-authored-by: flupkede <flupkede@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…8) (#79) Two Windows path-handling bugs that caused spurious "Database not found" errors and local duplicate indexes: 1. register()/register_with_alias() stored the raw canonicalize() result in repos.json. On Windows, canonicalize() returns \?\C:\... (extended-length UNC prefix). Downstream .join(".codesearch.db") and Path::exists() calls then fail inconsistently (\?\C:\foo\.codesearch.db not found even when C:\foo\.codesearch.db exists). 7 repos were affected. Fix: strip_unc() removes the prefix before storage. Existing repos.json patched in-place. Regression test: register_strips_unc_prefix_from_stored_path. 2. 500 "Database not found" from reindex (alias registered but DB gone) was treated as a generic failure -> local fallback -> duplicate index. Fix: triggers the same auto-register POST /repos path as 404 (DB recreated by serve, no local fallback). Co-authored-by: flupkede <flupkede@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: flupkede <flupkede@users.noreply.github.com>
… across codebase (#82) ROOT CAUSE OF RECURRING BUG CLASS Path::canonicalize() on Windows returns \?\C:\... (extended-length UNC prefix). Any downstream .join(), .exists(), or HashMap key built from that path behaves inconsistently — the sub-path \?\C:\foo\.codesearch.db may return false from exists() even when C:\foo\.codesearch.db is present. This class of bug has silently broken registrations multiple times. FIX Introduce safe_canonicalize(path: &Path) -> io::Result<PathBuf> and strip_unc_prefix(path: PathBuf) -> PathBuf in src/cache/file_meta.rs. These are the ONLY approved way to canonicalize paths in this codebase. Exported via crate::cache. CALL SITES UPDATED (all raw .canonicalize() removed) - src/cache/file_meta.rs — central definition + 5 new regression tests - src/db_discovery/repos.rs — register, register_with_alias, unregister_path, alias_for_path; local strip_unc() removed - src/db_discovery/mod.rs — find_best_database, get_db_path_for_cwd - src/index/mod.rs — find_git_root, get_global_db_path, add_to_index, remove_from_index, try_delegate_reindex_to_serve (x2), try_delegate_rm_to_serve - src/lmdb_registry.rs — TrackedEnv registry key (eliminates double-open risk when same dir accessed with and without \?\ prefix) - src/serve/mod.rs — add_repo_handler, run_serve --register path POLICY DOCUMENTED AGENTS.md: "⚠️ Canonical Path Policy — MANDATORY" section with rule, code example, and pointer to regression tests. REGRESSION TESTS (6 new in cache/file_meta.rs + 1 existing in repos.rs) - strip_unc_prefix_removes_windows_unc - strip_unc_prefix_is_idempotent_on_{plain_path,unix_path} - safe_canonicalize_on_existing_dir_returns_plain_path - safe_canonicalize_on_nonexistent_path_returns_error - register_strips_unc_prefix_from_stored_path (repos.rs — verifies fallback path also strips UNC when canonicalize() fails) 407 lib tests pass. clippy -D warnings clean. Co-authored-by: flupkede <flupkede@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…b_path_smart (#84) The old normalize_path(&p.canonicalize()...) pattern in get_db_path_smart was missed in the central safe_canonicalize refactor (v1.0.139). It worked correctly (normalize_path also strips UNC) but was inconsistent with the policy. Now all .canonicalize() calls outside safe_canonicalize's own definition are eliminated. Co-authored-by: flupkede <flupkede@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#86) PROBLEM 1 — ServeUnresponsive aborted with error instead of waiting When serve is warming up (opening LMDB for 15+ repos blocks the tokio runtime, causing /health to time out), the CLI refused with an error. The user had to retry manually. FIX: serve_delegate_with_warmup_wait() wraps both try_delegate_reindex_to_serve and try_delegate_add_to_serve. On ServeUnresponsive it prints "⏳ serve is starting up, waiting..." and retries every 8s up to 6 times (~2 min budget). On success it prints "✅ serve is ready, delegating...". Only exhausting the full budget returns an error. PROBLEM 2 — 409 Conflict from POST /repos on "Database not found" path When a registered repo's DB was missing, the CLI tried POST /repos to recreate it. Serve correctly returned 409 (alias already registered). The CLI treated 409 as a failure and fell back to local indexing. FIX: when auto-add returns 409, retry as POST /repos/{alias}/reindex?force=true. Force reindex uses allow_create=true and creates the DB via serve without local fallback. AGENTS.md: document the root cause (tokio blocking during warmup) as a remaining work item with diagnosis and fix guidance. Co-authored-by: flupkede <flupkede@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n_blocking (#88) PROBLEM codesearch serve became unresponsive during startup warmup: FileWalker::walk, VectorStore::build_index (HNSW), and fastembed/ONNX embedding (saturates all cores) ran synchronously on tokio worker threads. This starved the async runtime, /health timed out (>3s), and `codesearch index` reported "serve did not respond in time". The server already returns 202 + spawns background indexing (accept-and-defer); it just couldn't respond while warming. FIX Offload the heavy synchronous warmup work to tokio::task::spawn_blocking, so the async executor stays responsive (answers /health and accepts POST /repos immediately, runs the job in the background). - serve/mod.rs warmup_repo: read stats under .read(); build_index via spawn_blocking + Arc clone + blocking_write. Build failure only warns. - manager.rs perform_incremental_refresh_with_stores: walk, read+chunk+embed, and build_index all offloaded. - manager.rs refresh_index_with_stores: walk + both build_index calls offloaded. LOCK SAFETY (verified by review) Every async RwLock guard scope CLOSES before the spawn_blocking that calls .blocking_write() on the same store — no lock-over-await deadlock. blocking_write is only ever called inside spawn_blocking (never on an async worker). Test: test_incremental_refresh_up_to_date_is_noop exercises the refactored walk path. 408 lib tests pass, clippy -D warnings clean. Co-authored-by: flupkede <flupkede@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… commands (#91) * chore: add /merge and /release Claude Code slash commands Codify the project release workflow as two committed slash commands under .claude/commands/ (force-added past .gitignore, like .claude/CLAUDE.md): - /merge: README/CHANGELOG freshness checks -> commit -> validate -> push -> PR to develop -> auto-merge after CI. No tag. - /release: /merge, then promote develop -> master via a "Release vX.Y.Z" PR (protect-master allows develop), then push the vX.Y.Z tag that triggers release.yml. Includes optional post-release develop sync. Commands document the repo's real conventions: feature->develop->master flow, master branch protection, and the pre-commit version-bump-on-feature-branches rule that fixes the release version at the feature commit. Tooling-only change on a chore/ branch: no version bump, no CHANGELOG entry (CHANGELOG tracks the shipped binary's behavior). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: address review remarks on /merge and /release commands - /merge: abort unless on feature/*|features/*|fix/* (the only branches the pre-commit hook version-bumps) — closes the gap where running from a non-bumping branch silently broke the version/CHANGELOG premise. - Clarify CHANGELOG heading version math for multi-commit landings (hook bumps +1 per commit; verify heading matches Cargo.toml after the final commit). - Capture PR numbers explicitly (gh pr view --json number) before merge/poll. - /release: fetch --tags and guard against a double release (stop if the tag already exists locally or on origin). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: document /merge and /release workflow in AGENTS.md Add a Release workflow section describing the two slash commands, the branch-protection rule, the tag-triggers-release.yml pipeline, and the feature-branch-only version-bump rule that fixes the release version. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(chunker): semantic Markdown chunking via tree-sitter-md Markdown and .txt files were indexed as a single whole-file block (the fallback chunker has no char budget), so a search hit returned an entire page — real Aprimo docs reached 80 KB in one chunk. Add the tree-sitter-md *block* grammar and chunk Markdown by heading section instead: each chunk is one heading plus its own prose/code, excluding nested subsections (which become their own chunks). The heading path is carried in the breadcrumb context (File > Title > Subsection) so embeddings capture each section's place in the document. Also add split_oversized, a char- and line-aware splitter for the unstructured paths (Markdown + the generic fallback): a single physical line longer than the char budget is hard-split on UTF-8 boundaries, so scraped one-line HTML/markdown can no longer produce an enormous chunk. The structured code path keeps using split_if_needed unchanged, so code chunking is unaffected. - Cargo.toml: add tree-sitter-md 0.5.3 - grammar.rs/language.rs: register Markdown as tree-sitter-supported - semantic.rs: chunk_markdown + emit_md_section + split_oversized - tests: section split, nested breadcrumbs, oversized + long-line splits Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [worker] final review: fix chunk_markdown doc comment Reference the actual splitter used by the markdown path (split_oversized, char-aware) instead of split_if_needed (the code path's line-based splitter). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: document semantic Markdown chunking + correct language table - CHANGELOG: add [1.0.145] entry for tree-sitter-md block-grammar Markdown chunking (sections/headings/code fences). - README: expand the Supported Languages table to all 15 tree-sitter languages and bump the "9 languages" count to 15 — correcting pre-existing drift that omitted Shell, Ruby, PHP, YAML, JSON, and (new) Markdown. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(test): sanitize customer ref in markdown chunking fixtures The pre-push customer-ref guard flagged "aprimo" in two semantic.rs test fixtures (a frontmatter URL and a comment). Replaced with generic example.com / "real-world scraped docs" — the test assertions never reference either, so behavior is unchanged. Realign CHANGELOG heading to the post-bump version (1.0.146). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: flupkede <flupkede@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* [worker] stage 1/5: capture git remote identity per repo Add RepoMeta.git_remote (serde default, backward compatible) and a best-effort git_remote_url() helper. Populate it in register() and register_with_alias() so every registered repo records its remote.origin.url for later relocation of moved/renamed folders. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * @ [worker] stage 2/5: relocate moved repos + reconcile pass + index prune - Best-effort git relocation: try_relocate() walks to nearest existing ancestor and bounded-depth scans for a git root with matching remote.origin.url; unambiguous single match rewrites repos.json. - ServeState::reconcile_all_paths() runs at startup before phase 1/2/3; relocates or warns+skips missing paths (never crashes). - Existence guards added to phase-2 SCIP and phase-3 prewarm consumers. - New `codesearch index prune` command: relocate-first, else unregister stale aliases, with summary output. - CODESEARCH_RELOCATE_MAX_DEPTH env (default 3). - Unit tests for capture-on-register and try_relocate (renamed leaf, path-exists, no-remote, ambiguous). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> @ * @ [worker] stage 3/5: remove user-settable --alias, always derive - Drop `--alias`/`-a` from `index add` subcommand and the legacy `index --add` flag path. Alias is always derived from the directory name via ReposConfig::register(). - add_to_index() loses its `alias` parameter; legacy current-dir local DBs are now auto-registered with a derived alias. - Serve delegation always sends None so serve derives the alias too. - Replace test_cli_index_add_accepts_alias_flag with test_cli_index_add_rejects_alias_flag + parses_without_alias. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> @ * @ [worker] stage 4/5: tolerate hand-edited repos.json via reconcile() - ReposConfig::reconcile() runs from load_from() on both new and legacy parse paths (in-memory only, no disk write): 1. drop entries with empty/blank alias keys 2. drop orphan repos_meta entries with no matching repo 3. prune group members referencing unknown aliases; drop empty groups - Never renames existing alias keys (would break group refs); a non-standard hand-edited alias is tolerated as-is. Never crashes. - Unit tests for empty-key, group-pruning/empty-group, and orphan-meta. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> @ * @ [worker] stage 5/5: docs + tighten reconcile() visibility - Document stale-path relocation, `index prune`, derived-alias policy, and repos.json reconcile() in AGENTS.md and .claude/CLAUDE.md. - reconcile() is now pub(crate) (only used internally + same-module tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> @ * @ [worker] final review: use DB_DIR_NAME constant in relocation scan skip-list Replace hardcoded ".codesearch.db" literal with crate::constants::DB_DIR_NAME in is_skippable_scan_dir (no-hardcoded-config-strings rule). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> @ * @ [worker] tests: extract testable prune_stale/relocate_missing + expand coverage Refactor for testability (no behavior change): - Add pure ReposConfig::relocate_missing() -> (relocated, unresolved) and prune_stale() -> (relocated, removed); no disk I/O, no logging. - prune_index() and ServeState::reconcile_all_paths() now delegate to these, removing duplicated relocate-loop logic. New unit tests (8): - register_derives_alias_from_directory_name - try_relocate_finds_renamed_parent (parent-level rename within depth) - try_relocate_none_beyond_max_depth (depth bound enforced) - relocate_missing_rewrites_only_moved_repos - prune_stale_removes_unrelocatable_entries (+ group cleanup) - prune_stale_relocates_then_keeps_relocatable_entries - load_from_applies_reconcile_to_hand_edited_file (load-path reconcile) 24 repos lib tests pass; clippy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> @ * @ docs: README + CHANGELOG for relocation, index prune, derived alias - README: document `codesearch index prune`, automatic relocation of moved/renamed repos (CODESEARCH_RELOCATE_MAX_DEPTH), the alias-always- derived policy (no --alias flag), and hand-edited repos.json tolerance. - CHANGELOG: consolidated 1.0.149 entry (Added/Changed/Fixed). - README language table + alias example updates (pre-existing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> @ * @ [worker] address review remarks: align CHANGELOG version + restore log path - CHANGELOG entry retitled to 1.0.151 to match the shipped Cargo.toml version (pre-commit bumps patch by 1 on this commit). - reconcile warn for unresolved repos again includes the missing path for diagnostics (lost during the relocate_missing extraction). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> @ --------- Co-authored-by: flupkede <flupkede@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat: auto-prune stale repos during Phase 1 warmup When a repo's database or path no longer exists (e.g. folder moved), Phase 1 now automatically unregisters the alias from repos.json instead of logging a warning and leaving the stale entry forever. Prune conditions (safe — only missing-db / path-gone, not transient errors): - .codesearch.db directory does not exist at registered path - Registered path itself no longer exists - Alias resolves to nothing in config Side effects per pruned alias: - stop_fsw + evict from DashMap + remove last_access timer - unregister_alias (removes from repos, repos_meta, groups) - persist via config.save() Closes: stale repos.json entries after folder reorganization * fix: add missing YELLOW color variable in qc.sh * bump version to 1.0.153 — align with CHANGELOG Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: flupkede <flupkede@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up on PR #42 + #43 audit. Two gaps identified: - No automated tests for new Warm/Write state semantics, zombie-proof reaper, or /status endpoint - No HTTP timeouts in standalone TUI reqwest calls Co-authored-by: flupkede <flupkede@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: flupkede <flupkede@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ELOG (#98) Squash merge fix/windows-8dot3-path-relocation → develop
Owner
Author
|
Superseded by PR from release/v1.0.154 (conflicts resolved in release branch) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's in this release
Added (v1.0.146 → v1.0.153)
.md/.markdown/.txtparsed with tree-sitter-md block grammar; chunks align to sections, headings, code fencesremote.origin.urlcaptured at registration; on serve startup stale paths are relocated by matching git remote (bounded depth scan), else warned + skippedcodesearch index prune— new command: relocate moved repos first, then unregister remaining stale entriesrepos.jsoninstead of silently looping forever/mergeand/releaseClaude Code slash commands — documented merge + release playbook for this repoChanged
--alias/-aremoved fromindex add— alias always derived from directory name (prevents downstream mismatches)Fixed
repos.jsonno longer crashes —reconcile()on load drops empty-alias entries, orphanrepos_meta, group refs to unknown aliasesscripts/qc.sh/merge&/releasecommands fixed to use--squash(repo disallows merge commits)🤖 Generated with Claude Code