Enhance scanner and initialization features with migration support#87
Conversation
There was a problem hiding this comment.
Pull request overview
This PR migrates the dev-online-dl feature set onto upstream/dev, restoring lightweight runtime initialization/onboarding, patch/update channel support, config/data migration, and recursive scan safety behavior while reconciling packaging/startup differences.
Changes:
- Add lightweight runtime bootstrap + initialization progress model + source probing utilities.
- Reinstate recursive scan safety (dangerous-root blocking) and directory summary scanning, and wire it into CLI/UI batch flows.
- Rework release/build tooling (Python build scripts, CI helper, Inno Setup scripts) and update docs/website links + download metadata.
Reviewed changes
Copilot reviewed 67 out of 69 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| workflows/intel-build.md | Updates Intel mac build instructions to use build_release_mac.py. |
| workflows/dev_docs/project_structure.md | Updates project structure docs for new build scripts. |
| ui/custom_dialogs.py | Adjusts dialog button sizing/layout behavior. |
| topiq_model.py | Expands model weight search paths for patch/overlay scenarios. |
| tools/update_checker.py | Adds patch runtime blocking/skip reporting during update checks. |
| tools/patch_manager.py | Adds runtime/channel gating + safer patch cleanup + metadata validation. |
| superpicky_cli.py | Wires dangerous-root blocking + directory summaries into recursive batch CLI; aligns defaults. |
| scripts/verify_patch_cleanup_regression.py | Adds manual regression script for residual patch cleanup scenarios. |
| scripts/ci_release.py | Adds CI utilities for metadata resolution, asset collection, patch packaging, secret materialization. |
| requirements_runtime_mac.txt | Adds mac runtime requirements for lightweight initialization flow. |
| requirements_runtime_cuda.txt | Adds CUDA runtime requirements for lightweight initialization flow. |
| requirements_runtime_cpu.txt | Adds CPU runtime requirements for lightweight initialization flow. |
| requirements_cuda.txt | Adds mirror extra-index-url for CUDA wheels. |
| main.py | Refactors early startup: patch overlay injection, runtime bootstrap dispatch, legacy migration, logging notes. |
| inno/SuperPicky_CUDA_Patch.iss | Parameterizes CUDA patch installer metadata and filenames. |
| inno/SuperPicky.iss | Replaces full installer script with simplified/parameterized version and cleanup directives. |
| inno/SuperPicky-lite.iss | Adds Lite installer script. |
| docs/tutorial-4.2.1.html | Updates GitHub links to new org/repo. |
| docs/tutorial-3.9.4.html | Updates GitHub links to new org/repo. |
| docs/sponsors.html | Updates GitHub links to new org/repo. |
| docs/index.html | Updates release download links and GitHub base URL. |
| docs/faq.html | Updates GitHub links to new org/repo. |
| docs/downloads_github.json | Updates beta tag/asset names and timestamp formatting. |
| docs/downloads_gitcode.json | Updates beta tag and asset naming format. |
| docs/downloads.html | Updates GitHub release download links throughout. |
| core/source_probe.py | Adds HTTP-based mirror/source probing with per-run caching. |
| core/runtime_requirements.py | Adds Python representation of runtime dependency sets for lightweight init. |
| core/runtime_bootstrap.py | Adds frozen bootstrap entrypoint to install runtime deps into app-local site-packages. |
| core/recursive_scanner.py | Adds dangerous-root detection + directory summary scan results + DFS scanner refactor. |
| core/photo_processor.py | Precomputes GPU cache-clearing policy and avoids repeated imports in loop. |
| core/keypoint_detector.py | Updates model resource path resolution for frozen/overlay scenarios; strengthens typing. |
| core/flight_detector.py | Updates model resource path resolution for frozen/overlay scenarios; strengthens typing. |
| core/build_info.py | Updates commit hash and release channel injection notes/values. |
| core/batch_processor.py | Accepts scanned directory summaries and aligns depth default with scanner constant. |
| constants.py | Rewrites file with normalized newlines/formatting (no semantic change). |
| build_release_lite_win.bat | Adds wrapper to invoke build_release_win.py lite build. |
| build_release_lite_mac.sh | Adds wrapper to invoke build_release_mac.py lite build. |
| build_release_full_mac.sh | Adds wrapper to invoke build_release_mac.py full build. |
| build_release_cuda.bat | Updates CUDA build wrapper to call Python build script. |
| build_release_cpu.bat | Simplifies CPU build wrapper to call compatibility wrapper. |
| build_release_all.bat | Builds CPU + Lite artifacts in sequence. |
| build_release.sh | Converts mac build script into compatibility wrapper for Python builder. |
| build_release.bat | Converts Windows build script into compatibility wrapper for Python builder. |
| birdid/osea_classifier.py | Updates resource path resolution and formatting/typing cleanups. |
| ai_model.py | Hardens YOLO model load path resolution and missing-file error behavior. |
| advanced_config.py | Adds initialization/runtime bootstrap state fields and changes update-check default. |
| SuperPicky_lite_win.spec | Adds Windows Lite spec with runtime bootstrap hiddenimports and excludes Torch stack. |
| SuperPicky_lite.spec | Adds mac Lite spec with env-driven codesign/arch parameters. |
| SuperPicky_full.spec | Adds wrapper spec that execs legacy full spec. |
| SuperPicky.spec | Adds env-driven arch/codesign parameters and version override support. |
| CLAUDE.md | Expands agent guidance, verification expectations, and cross-platform/UTF-8 constraints. |
| AGENTS.md | Expands agent guidance mirroring CLAUDE rules and verification expectations. |
| .gitignore | Ignores .python-version. |
| .github/workflows/build-release-lite.yml | Adds GitHub Actions workflow for building and releasing Lite artifacts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 'pstats', | ||
| 'queue', | ||
| 'resource', | ||
| 'runpy', | ||
| 'shlex', | ||
| 'signal', | ||
| 'sqlite3', | ||
| 'statistics', | ||
| 'sysconfig', | ||
| 'tarfile', | ||
| 'timeit', |
There was a problem hiding this comment.
runtime_bootstrap_stdlib_hiddenimports 包含 resource,该模块在 Windows 上不存在,PyInstaller 会产生 “hidden import not found” 的噪声日志,部分配置下可能导致构建失败。建议移除 resource,或按平台条件化添加。
| # 发布渠道:CI 打包时自动注入("nightly" = RC 预发布,"official" = 正式版) | ||
| # 本地开发默认 "dev",不触发更新检查 | ||
| RELEASE_CHANNEL = "dev" | ||
| RELEASE_CHANNEL = "official" |
There was a problem hiding this comment.
RELEASE_CHANNEL 的注释写明“本地开发默认 dev,不触发更新检查”,但当前值被提交为 "official",会导致源码运行/本地调试时走正式渠道逻辑(例如更新与补丁判定),与预期不一致。建议将仓库内默认值恢复为 dev,并仅在构建/CI 注入 official/nightly。
| RELEASE_CHANNEL = "official" | |
| RELEASE_CHANNEL = "dev" |
| from typing import Dict, Iterable, List, Optional | ||
|
|
||
| logging.basicConfig(level=logging.INFO) | ||
|
|
There was a problem hiding this comment.
模块 import 时调用 logging.basicConfig(...) 会改写全局 logging 配置(尤其是在主入口已安装日志处理器的情况下),容易导致日志格式/输出目的地被意外覆盖。建议移除 basicConfig,改为使用 logger = logging.getLogger(__name__) 并让上层统一配置 handler。
| start = time.perf_counter() | ||
| request = urllib.request.Request( | ||
| _normalize_probe_url(url), | ||
| headers={"User-Agent": "SuperPicky-InitProbe/1.0"}, | ||
| method="GET", | ||
| ) | ||
| try: | ||
| with urllib.request.urlopen(request, timeout=timeout) as response: | ||
| first_byte_start = time.perf_counter() | ||
| response.read(256) | ||
| first_byte_ms = (time.perf_counter() - first_byte_start) * 1000.0 |
There was a problem hiding this comment.
first_byte_ms 的计算逻辑与字段含义(“首字节响应时间”)不一致:当前是在 urlopen() 已经返回之后才开始计时,测到的更像是“读取前 256 字节耗时”,无法反映从发起请求到首字节到达的延迟。建议将首字节计时起点放在请求发起之前,并在成功读取到第一个字节(或 headers)时记录差值,或将字段/日志文案改成与实际含义一致。
| def to_requirements_string( | ||
| self, | ||
| *, | ||
| include_indexes: bool = True, | ||
| package_urls: dict[str, str] | None = None, | ||
| ) -> str: | ||
| """ | ||
| Convert configuration to pip requirements file format. | ||
|
|
||
| 将配置转换为 pip requirements 文件格式。 | ||
| """ | ||
| lines = [] | ||
| package_urls = package_urls or {} | ||
| if include_indexes and self.index_url: | ||
| lines.append(f"--index-url {self.index_url}") | ||
| if include_indexes: | ||
| for url in self.extra_index_urls: | ||
| lines.append(f"--extra-index-url {url}") | ||
| lines.append( | ||
| package_urls.get( | ||
| "torch", | ||
| self._format_pinned_requirement("torch", self.torch_version), | ||
| ) | ||
| ) | ||
| lines.append( | ||
| package_urls.get( | ||
| "torchvision", | ||
| self._format_pinned_requirement("torchvision", self.torchvision_version), | ||
| ) | ||
| ) | ||
| lines.append( | ||
| package_urls.get( | ||
| "torchaudio", | ||
| self._format_pinned_requirement("torchaudio", self.torchaudio_version), | ||
| ) | ||
| ) | ||
| lines.append(f"timm{self.timm_version}") | ||
| return "\n".join(lines) | ||
|
|
There was a problem hiding this comment.
to_requirements_string() 目前只输出 torch/torchvision/torchaudio/timm,但初始化流程会用 Hugging Face 下载资源(core/initialization_manager.py 会配置 HF_ENDPOINT,并依赖 scripts/download_models.py),同时 requirements_runtime_*.txt 也显式包含 huggingface_hub 与 lap。如果轻量运行时安装只按这里生成的 requirements 执行,首启下载/识别链路可能因缺少依赖而失败。建议把 huggingface_hub、lap(以及该流程实际所需的其他运行时包)纳入 RuntimeRequirements / 输出内容,或明确改为引用 requirements_base.txt。
| Runtime requirements manager for lightweight builds. | ||
|
|
||
| This module provides a unified interface for managing platform-specific runtime | ||
| dependencies across CPU, CUDA, and macOS builds. It consolidates the previously | ||
| separate requirements_runtime_*.txt files into a single Python module with | ||
| type-safe configuration access. | ||
|
|
||
| 轻量化构建的运行时依赖管理模块。 | ||
|
|
||
| 该模块为 CPU、CUDA 和 macOS 构建统一描述运行时依赖, | ||
| 避免把平台差异散落在多个 requirements 文本文件与调用点之间。 | ||
| """ |
There was a problem hiding this comment.
文件头注释声称“已将 requirements_runtime_*.txt 合并到单一 Python 模块”,但本 PR 同时新增了 requirements_runtime_cpu.txt/cuda.txt/mac.txt,容易误导维护者后续修改依赖时不知道该改哪一处。建议更新 docstring,明确:文本 requirements 仍用于兼容/发布,Python 模块用于初始化流程生成 requirements(或相反),并说明二者的权威来源。
| def _make_path_writable(path: str) -> None: | ||
| try: | ||
| os.chmod(path, stat.S_IWRITE | stat.S_IREAD) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
_make_path_writable() 直接把权限设置为 S_IWRITE | S_IREAD 会覆盖原有权限位;当目标是目录时可能清掉 +x,反而导致 rmtree/unlink 仍然失败。建议基于现有 mode 做按位 OR(例如 os.stat(path).st_mode | stat.S_IWRITE),并对目录额外保留/补齐可执行位,避免清理逻辑在 POSIX 上失效。
|
@yblpoi Merged to Next: integration testing on When you have time, please close the original superpickyapp/Superpicky#1 — that repo is archived. Thanks again! |
构建 CI 用 `python scripts/download_models.py` 调用本脚本,sys.path[0] 是 scripts/ 目录,项目根不在搜索路径,导致第 59 行 `from core.initialization_progress import ...` 触发 ModuleNotFoundError,进而 build_release_mac.py / Windows 构建在 "步骤 3: 检查并下载模型文件" 整个失败。 修复:在 `from core.*` 之前把项目根插入 sys.path[0]。 bug 由 v4.2.6-RC1 (commit 0cbfb9d) CI 暴露,根因来自 PR #87 upstream migration 新引入的脚本未保留这条路径补全。 验证: - .venv/bin/python -m py_compile 通过 - 本地直接运行 `python scripts/download_models.py` 已能进入下载流程 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR #87 的 download_models.py 把 yolo_segmentation 资源指向了 jamesphotography/SuperPicky-models,但该 HF 仓库只发布 .onnx 权重, 没有 .pt 文件,CI 在下载时会得到 404。 Ultralytics 官方维护的公开仓库 Ultralytics/YOLO11 中含完整 yolo11l-seg.pt, 切换 repo_id 即可解决。本地验证:53.5MB 6.47 秒下载成功,exit 0。 bug 由 v4.2.6-RC2 (commit 39fe53b) CI 第 4.25 秒后暴露。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR #87 引入的 prepare_signing 流程创建了临时 keychain 并 import .p12, 但没把临时 keychain 加入 user 域的 search list。macOS codesign 不会主动 搜索任意 keychain,--keychain 标志也只限制查询范围、不会自动加入搜索列表, 导致 CI 在签名 .app 内嵌资源 (例如 QtVirtualKeyboardQml) 时报: error: The specified item could not be found in the keychain. 即使同一个 keychain 用 `security find-identity -p codesigning` 能找到 Developer ID 证书。 修复:在 prepare_signing 创建/配置 keychain 后、unlock 之前,调用 `security list-keychains -d user -s <new_kc> <existing_kc...>` 把新 keychain 插入搜索列表 (并保留原有条目)。 bug 由 v4.2.6-RC3 (commit e299e60) 在第 5 分钟触发。 验证: - py_compile 通过 - Windows 构建 RC3 已成功,确认仅 Mac 端签名链路问题 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
合并 origin/dev 到 nightly,整合 PR jamesphotography#87 (upstream migration) 与 nightly 端的关键修复。 冲突解决策略: - docs/downloads_github.json: 保留 nightly v4.2.5 (hash e8fca74) - locales/{zh_CN,en_US}.json: 采用 dev 超集 (+78 onboarding/init/health 键) - core/keypoint_detector.py: 保留 nightly (含 MPS 显存释放注释) - birdid/bird_identifier.py: 保留 nightly 的 i18n 日志 + overlay 注释,结构与 dev PEP8 对齐 - core/flight_detector.py: 采用 dev 的 cast/None 检查 + 保留 nightly 的 MPS 显存释放注释 - .github/workflows/build-release.yml: - 保留 nightly 注释掉的 GitCode/mirror sync 块 - 采用 dev 重构后的 scripts/ci_release.py build-patch 与 output/mac/*.dmg 上传 确认保留的 nightly 修复: - bc21cdd MPS 显存泄漏修复 (del image_tensor / del tensor 注释) - 5e68551 BirdID code_updates overlay 路径查找 - da71e14 i18n 日志修复 (logs.birdid_fallback_model) - 889eb41 GitCode/镜像同步暂时注释 (留待重启用) - 0e3c602 v4.2.5 正式版下载元数据 (e8fca74 hash) 确认引入的 dev 改进: - PR jamesphotography#87 upstream migration (init bootstrap, onboarding, scanner) - typing.cast 类型加固 - ci_release.py 工作流重构 验证: - .venv/bin/python -m py_compile 通过 (3 个冲突文件 + 全量改动文件) - JSON / YAML 解析通过 - 所有 7 个冲突文件冲突标记清空 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR: Migrate dev-online-dl feature set into upstream/dev
Summary
This PR migrates the dev-online-dl feature set onto upstream/dev and reconciles the major behavioral differences encountered during the migration. The result is a single branch that preserves the upstream baseline while restoring the full runtime initialization, onboarding, update, config migration, and recursive scan safety behaviors that were previously only available on dev-online-dl.
Primary goals:
Main Changes
1. Runtime initialization and startup path
Representative commits:
2. Recursive scanning and safety controls
Representative commits:
3. Patch/update and configuration migration
Representative commits:
4. Repo maintenance and guidance sync
Representative commit:
5. Final migration aggregation
Representative commit:
Why this migration is structured this way
This branch was migrated by replaying behavior intentionally at the owning control points instead of blindly transplanting every source commit. That was necessary because several dev-online-dl behaviors no longer matched the control flow or packaging assumptions on upstream/dev. The migration therefore favors semantic restoration over commit-for-commit replay.
Validation
Validated with focused checks during migration and post-migration repair:
Post-migration zero-byte regression verification:
Risks and Review Focus
Please review these areas carefully:
Suggested Test Plan
摘要
此 PR 将 dev-online-dl 功能集迁移到 upstream/dev,并解决迁移过程中遇到的主要行为差异。最终形成一个单一分支,该分支保留了上游基线,同时恢复了之前仅在 dev-online-dl 上提供的完整运行时初始化、启动、更新、配置迁移和递归扫描安全行为。
主要目标:
完成功能迁移,避免盲目重现不兼容的历史记录
保留与上游兼容的启动和打包行为
恢复运行时引导、首次运行配置、源探测和下载回退
保留递归扫描安全性和相关的目录摘要行为
保留补丁/更新通道支持和统一的应用程序数据迁移逻辑
主要变更
1. 运行时初始化和启动路径
添加了运行时引导入口点并集成了启动初始化流程
添加了首次运行配置管理器和相关的初始化编排
恢复了运行时依赖项的源探测和下载回退处理
代表性提交:
7e65259 feat(init): 添加运行时引导入口点
a699109 feat(init): 添加首次运行配置管理器
745b5a4 feat(init): 添加源探测和下载回退
2. 递归扫描和安全控制
添加了递归扫描安全检查以阻止危险的根目录
添加了递归扫描流程的目录摘要行为
恢复了迁移覆盖后的零字节照片过滤
代表性提交:
417d03d 新增功能(扫描器):添加递归扫描安全性和目录摘要
296d891 修复(扫描器):恢复零字节照片过滤
3. 补丁/更新和配置迁移
添加了补丁更新通道支持
统一了应用数据目录处理
添加了旧版数据迁移支持,以减少升级阻力
代表性提交:
4c10dad 新增功能:添加补丁更新通道支持
cb55eea 重构(配置):统一了应用数据目录并迁移了旧版数据
4. 仓库维护和指南同步
代表性提交:
5. 最终迁移汇总
代表性提交:
为什么这种迁移方式是这样的
此分支的迁移是通过在所属控制点有意重放行为来实现的,而不是盲目地移植每个源提交。这是必要的,因为 dev-online-dl 的几个行为不再符合 upstream/dev 上的控制流或打包假设。因此,迁移优先考虑语义恢复,而不是逐个提交地重放。
验证
在迁移和迁移后修复过程中,我们进行了以下重点检查:
使用项目虚拟环境进行 Python 编译检查
针对递归扫描器安全行为的单元测试
零字节照片过滤的回归重现和验证
对已编辑的数据文件进行 JSON 加载检查(如适用)
对已修改的脚本进行 shell 语法检查(如适用)
迁移后零字节回归验证:
使用包含 empty.jpg 和 ok.jpg 的最小目录重现了该错误
确认了修复前的错误结果
恢复了递归扫描和处理路径中的过滤功能
重新运行了最小重现步骤,并确认仅对非空照片进行计数
风险和审查重点
请仔细审查以下方面:
全新安装、升级和打包运行时场景下的启动行为
初始化状态转换的引导流程集成
网络故障情况下的依赖项源探测和回退逻辑
在深层树和受保护的系统路径上的递归扫描行为
在具有旧状态的机器上的应用程序数据迁移行为
补丁/更新通道的行为跨现有安装
建议测试计划
在全新环境中启动应用,并验证引导流程是否仅在预期情况下出现。
在已升级且包含现有应用数据的环境中启动应用,并验证迁移行为。
使用在线路径和回退路径执行运行时依赖关系探测。
对普通用户照片目录运行递归扫描,并确认摘要正确。
尝试扫描危险根目录目标,并确认这些目标已被阻止。
验证零字节 RAW/JPG/HEIF 文件是否被跳过并记录日志,而不是进入处理流程。
验证现有安装中的补丁/更新通道行为。