Version: leann-core 0.3.7
Env: WSL2, Python 3.12
Problem A — leann watch hard-crashes on media/binary files
leann watch's change detection hashes file contents via a LlamaIndex SimpleDirectoryReader with the full default extractor set:
cli.py _detect_build_changes → sync.py detect_changes → sync.py generate_file_hashes
→ reader.iter_data() → SimpleDirectoryReader.load_file → readers/file/video_audio/base.py
→ ImportError: Please install OpenAI whisper ...
So a single audio/video file anywhere in the scanned roots takes the whole watcher down with an unhandled ImportError (image/binary files also emit load errors). Notably, leann build survives the same tree because it honors --file-types (required_exts), but the watch/sync hash scan does not apply that filter — an inconsistency. Change detection should restrict to the index's configured file types (or at least skip unreadable files gracefully) rather than instantiate media readers.
Problem B — indexing a root-level file expands the scan to the whole repo
If --docs includes any loose file at the repo root (e.g. README.md), the watcher's resolved sync roots collapse to the repo root, so it then crawls everything — node_modules/, build dirs, vendored mirrors — and trips Problem A on whatever binary/media it finds:
📂 Indexing 15 paths:
1. /home/.../my-repo ← entire repo
Failed to load .../assets/img/icon/wasm.png ...
ImportError: Please install OpenAI whisper ...
Repro
- Build an index whose
--docs includes a repo-root file plus subdirs, in a repo that also contains any media/binary file (images count), e.g. leann build x --docs ./src README.md --file-types .py,.md.
- Start
leann watch x → it scans the repo root and crashes on the first media file.
Expected
Watch honors the index's configured file types / scan scope, and never crashes on files it wouldn't index anyway.
Workaround
Pass only clean subdirectories to --docs (no root-level files), and keep media/binaries out of indexed dirs.
Version: leann-core 0.3.7
Env: WSL2, Python 3.12
Problem A —
leann watchhard-crashes on media/binary filesleann watch's change detection hashes file contents via a LlamaIndexSimpleDirectoryReaderwith the full default extractor set:So a single audio/video file anywhere in the scanned roots takes the whole watcher down with an unhandled
ImportError(image/binary files also emit load errors). Notably,leann buildsurvives the same tree because it honors--file-types(required_exts), but the watch/sync hash scan does not apply that filter — an inconsistency. Change detection should restrict to the index's configured file types (or at least skip unreadable files gracefully) rather than instantiate media readers.Problem B — indexing a root-level file expands the scan to the whole repo
If
--docsincludes any loose file at the repo root (e.g.README.md), the watcher's resolved sync roots collapse to the repo root, so it then crawls everything —node_modules/, build dirs, vendored mirrors — and trips Problem A on whatever binary/media it finds:Repro
--docsincludes a repo-root file plus subdirs, in a repo that also contains any media/binary file (images count), e.g.leann build x --docs ./src README.md --file-types .py,.md.leann watch x→ it scans the repo root and crashes on the first media file.Expected
Watch honors the index's configured file types / scan scope, and never crashes on files it wouldn't index anyway.
Workaround
Pass only clean subdirectories to
--docs(no root-level files), and keep media/binaries out of indexed dirs.