Add size-aware reporting and safe exclusions for rebuildable runtime/cache paths#11
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d6dd97b6f8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if skipped is not None: | ||
| entry = { | ||
| "relative_path": normalize_relative(relative), | ||
| "reason": "excluded", | ||
| } |
There was a problem hiding this comment.
Count pruned excluded directories in size report
excluded_entries/excluded_bytes are derived from skipped, but this block only appends entries for excluded files that are still reached in file_names. In the same function, excluded directories are removed from dir_names before traversal, so files under newly excluded paths like cache/, packages/, standalone/, and node_modules/ are never visited and therefore never counted. This makes the new size report systematically understate excluded data (often reporting 0 bytes) when exclusions happen at directory level, which defeats the purpose of surfacing size drivers.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
@codex address that feedback. Please count files and bytes under pruned excluded directories in the size report without including file contents or changing restore safety behavior.
There was a problem hiding this comment.
Summary
- Implemented excluded-directory accounting inside
iter_source_filesby detecting directories pruned fromdir_names, walking those excluded subtrees, and recording aggregatefile_count+bytesinskippedmetadata (without reading file contents). This preserves restore behavior while fixing size-report undercounting for excluded directory trees. src/agent_environment_backup/core.pyL624-L708 - Updated backup size-report aggregation to compute
excluded_entriesfrom excluded file counts (usingfile_countwhen present, defaulting to 1 for single-file exclusions) and keptexcluded_bytessummed from skipped excluded items. src/agent_environment_backup/core.pyL1489-L1523 - Updated summary output line to report the corrected excluded entry count. src/agent_environment_backup/core.pyL1531-L1540
- Committed changes on the current branch:
7db7023with messageCount pruned excluded directories in size report. src/agent_environment_backup/core.pyL624-L708
Testing
- ✅
python -m pytest -q
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6f2a02a0db
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| config_inspector=inspect_codex_config, | ||
| commands=(("codex", "--version"), ("codex", "mcp", "list")), | ||
| integration_module="codex_fast_proxy", | ||
| extra_excluded_dirs=("cache", "packages", "standalone", "node_modules"), |
There was a problem hiding this comment.
Sync standalone restore exclusions with Codex profile
Update the standalone restore script exclusions when adding extra_excluded_dirs for the Codex profile, otherwise restore behavior diverges between restore_environment(...) and restore-standalone.py. With this change, Codex backups now treat cache/packages/standalone/node_modules as excluded, but the embedded PROFILE_EXTRA_EXCLUDED used by the standalone script still omits Codex entries, so applying an older backup that contains those paths will restore runtime/cache content that the main restore path now skips.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
@codex address that feedback. Please sync the standalone restore script profile exclusions with the Codex profile so restore-standalone.py skips cache, packages, standalone, and node_modules consistently with the main restore path. Add or update tests for standalone restore behavior.
There was a problem hiding this comment.
Summary
- Synced the embedded standalone restore script’s profile exclusions by adding a
codexentry toPROFILE_EXTRA_EXCLUDED, sorestore-standalone.pynow skipscache,packages,standalone, andnode_modulesconsistently with the main restore path. src/agent_environment_backup/core.pyL871-L874 - Added a new regression test that creates a Codex backup, injects files under each excluded directory in the backup payload, runs
restore-standalone.py, and asserts those files are not restored. tests/test_core.pyL1296-L1337 - Committed the change on the current branch as
25c9d47and created a follow-up PR message via themake_prtool.
Testing
- ✅
python -m pytest -q tests/test_core.py -k "standalone_restore_excludes_profile_dirs or standalone_restore_excludes_codex_profile_dirs"
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: af395eec6b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def excluded_subtree_stats(path: Path) -> tuple[int, int]: | ||
| file_count = 0 | ||
| total_bytes = 0 | ||
| for walk_root, _, walk_files in os.walk(path): |
There was a problem hiding this comment.
Avoid traversing symlinked excluded directories
excluded_subtree_stats walks every excluded directory with os.walk(path), but when the excluded entry itself is a symlink (for example ~/.codex/cache -> /mnt/cache), os.walk traverses the symlink target as the top path. That makes excluded_entries/excluded_bytes include files outside the backup home and can massively inflate runtime on large external trees, even though those files are never part of the backup. This is introduced by the new size-report scan path and should short-circuit or handle symlinked excluded dirs without recursive walking.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
@codex address that feedback. Please make excluded-directory size accounting avoid traversing symlinked excluded directories. Count the symlink itself as skipped/excluded if useful, but do not recurse into the symlink target or include bytes outside the backup home. Add a regression test with an excluded directory symlink pointing outside the home.
There was a problem hiding this comment.
Summary
- Updated excluded-directory size accounting to short-circuit when an excluded directory entry is a symlink, so it no longer traverses symlink targets; instead it counts the symlink itself (
file_count=1) and useslstat()size when available. This prevents counting bytes outside the backup home while preserving exclusion behavior. src/agent_environment_backup/core.pyL624-L679 - Added a regression test that creates an excluded
cachedirectory as a symlink to an external directory and verifies excluded accounting does not include external subtree size. tests/test_core.pyL265-L290 - Committed on the current branch:
969e091with messageAvoid traversing symlinked excluded directories.
Testing
- ✅
python -m pytest -q tests/test_core.py -k "excluded_symlinked_directory_does_not_traverse_outside_home or excludes_runtime_cache_paths"
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5110da7ab7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| walk_path = walk_root_path / walk_file | ||
| file_count += 1 | ||
| try: | ||
| total_bytes += walk_path.stat().st_size |
There was a problem hiding this comment.
Use lstat for excluded subtree byte accounting
excluded_subtree_stats uses walk_path.stat().st_size, which follows symlinks for files inside excluded directories. If an excluded directory contains a symlinked file (e.g., cache/link -> /mnt/large.bin), excluded_bytes will include the external target size and can trigger expensive metadata access outside the backup home, so the new size report becomes inaccurate and can slow backups unexpectedly. Use lstat() (or stat(..., follow_symlinks=False)) for this accounting, or skip symlink files entirely.
Useful? React with 👍 / 👎.
Motivation
Description
Fixes #10
Codex Task