fix: handle dots and multi-hyphen dirnames in _decode_project_path#49
fix: handle dots and multi-hyphen dirnames in _decode_project_path#49withsivram wants to merge 2 commits intochopratejas:mainfrom
Conversation
…ssue chopratejas#47) _greedy_path_decode previously only tried joining two consecutive dash-split tokens as a single hyphenated directory component. This made it impossible to reconstruct directory names that contained three or more hyphens (e.g. my-cool-project) or that combined dots with hyphens (e.g. GitHub.nosync, my-project.nosync). As a result headroom learn silently skipped any project whose path passed through such a directory. Fix: replace the pair-only join with a loop that tries joining 1, 2, 3 … consecutive tokens as a single path component, so all possible hyphen groupings are explored. The algorithm remains the same depth-first greedy search validated by filesystem existence checks. Adds tests/test_learn/test_scanner.py with 16 test cases covering plain dirs, single-hyphen names, multi-hyphen names, dot-only names, and every combination of dots + hyphens across parent and child components. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR improves Claude Code project path decoding so projects aren’t skipped when directory names contain multiple literal hyphens (including cases where a dotted suffix like .nosync is present alongside hyphens), and adds targeted regression tests for the decoder logic (issue #47).
Changes:
- Update
_greedy_path_decodeto try merging 1..N consecutive dash-split tokens into a single path component. - Expand
_decode_project_pathdocstring to clarify support for dotted and multi-hyphen directory names. - Add a new test module with cases covering plain, single-hyphen, multi-hyphen, dot-only, and dot+hyphen combinations.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
headroom/learn/scanner.py |
Expands greedy decoding to consider all token-merge lengths when reconstructing real filesystem paths. |
tests/test_learn/test_scanner.py |
Adds regression/unit tests for _greedy_path_decode and integration-style tests for _decode_project_path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Try using 1, 2, 3, … consecutive tokens as a single directory component | ||
| # (tokens joined with literal hyphens). The single-token case handles the | ||
| # common path-separator interpretation; multi-token cases handle directory | ||
| # names that themselves contain hyphens. | ||
| for n_tokens in range(1, len(parts) + 1): | ||
| component = "-".join(parts[:n_tokens]) | ||
| candidate = base / component | ||
| result = _greedy_path_decode(candidate, parts[n_tokens:]) |
There was a problem hiding this comment.
_greedy_path_decode explores all token groupings but currently recurses into candidate paths even when the candidate prefix directory doesn't exist. For longer escaped names this can blow up exponentially (and do lots of unnecessary recursion) when the final path doesn't exist. Add a fast pruning check (e.g., skip recursion unless candidate.exists()/candidate.is_dir()), and consider early-returning None when base itself doesn't exist.
|
|
||
| from __future__ import annotations | ||
|
|
||
| import tempfile |
There was a problem hiding this comment.
tempfile is imported but never used in this test module. Please remove the unused import to avoid lint failures and keep the test file minimal.
| import tempfile |
| import os | ||
|
|
||
| home = Path.home() | ||
| if str(home).startswith("/Users/"): | ||
| base = home / ".pytest_headroom_tmp" | ||
| base.mkdir(exist_ok=True) | ||
| # Use a sub-directory unique to this test invocation | ||
| import uuid | ||
|
|
||
| unique = base / uuid.uuid4().hex | ||
| unique.mkdir() |
There was a problem hiding this comment.
The users_tmp fixture writes into the real home directory and the comment says it will "skip rather than fail" if not writable, but there’s no error handling—mkdir() can raise (e.g., in sandboxed CI or read-only HOME) and fail the suite. Consider wrapping the home-directory creation in try/except OSError and either pytest.skip(...) or fall back to tmp_path; also remove the unused os import in this fixture.
| import os | |
| home = Path.home() | |
| if str(home).startswith("/Users/"): | |
| base = home / ".pytest_headroom_tmp" | |
| base.mkdir(exist_ok=True) | |
| # Use a sub-directory unique to this test invocation | |
| import uuid | |
| unique = base / uuid.uuid4().hex | |
| unique.mkdir() | |
| home = Path.home() | |
| if str(home).startswith("/Users/"): | |
| base = home / ".pytest_headroom_tmp" | |
| try: | |
| base.mkdir(exist_ok=True) | |
| except OSError: | |
| pytest.skip("Home directory not writable for users_tmp fixture") | |
| # Use a sub-directory unique to this test invocation | |
| import uuid | |
| unique = base / uuid.uuid4().hex | |
| try: | |
| unique.mkdir() | |
| except OSError: | |
| pytest.skip("Home directory not writable for users_tmp fixture") |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks @withsivram but this didn't work for me. I created my own PR #52 |
Summary
_greedy_path_decodepreviously only tried joining two consecutive dash-split tokens, making it impossible to reconstruct directory names with 3+ hyphens (e.g.my-cool-project) or dots with hyphens (e.g.GitHub.nosync,my-project.nosync)tests/test_learn/test_scanner.pywith 16 test cases covering plain dirs, single-hyphen, multi-hyphen, dot-only, and dot+hyphen combinationsTest plan
pytest tests/test_learn/test_scanner.py— all 16 tests passheadroom learnworks correctly for projects in directories with dots/hyphens in their namesFixes #47
🤖 Generated with Claude Code