[codex] Add archive policy and concurrency boundaries#9
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces version 0.3.0 of the WATCHLIST skill, adding support for configurable archive policies (manual and suggest) and a new concurrent edit/ID collision policy. Key changes include updates to the documentation, schema, and validation logic to handle archive_policy and archive_after_days fields, along with new evaluation cases to ensure correct behavior during reviews and item additions. Feedback was provided regarding the validation of top-level fields, specifically recommending explicit detection of duplicate or unknown fields and ensuring that archive_after_days is present when the suggest policy is active.
| def top_level_fields(text: str) -> dict[str, str]: | ||
| text = strip_html_comments(text) | ||
| preamble = re.split(r"^##\s+", text, maxsplit=1, flags=re.M)[0] | ||
| return { | ||
| match.group(1): match.group(2).strip() | ||
| for match in TOP_LEVEL_FIELD_RE.finditer(preamble) | ||
| } | ||
|
|
||
|
|
||
| def add_format_finding( | ||
| result: ValidationResult, | ||
| options: ValidationOptions, | ||
| code: str, | ||
| message: str, | ||
| ) -> None: | ||
| if options.strict_format: | ||
| add_error(result, code, message) | ||
| else: | ||
| add_warning(result, code, message) | ||
|
|
||
|
|
||
| def validate_top_level_fields(text: str, result: ValidationResult, options: ValidationOptions) -> None: | ||
| fields = top_level_fields(text) | ||
| archive_policy = fields.get("archive_policy") | ||
| archive_after_days = fields.get("archive_after_days") | ||
|
|
||
| if archive_policy and archive_policy not in VALID_ARCHIVE_POLICIES: | ||
| add_error( | ||
| result, | ||
| "INVALID_ARCHIVE_POLICY", | ||
| f"Invalid archive_policy: {archive_policy}. Use manual or suggest.", | ||
| ) | ||
|
|
||
| if archive_after_days is None: | ||
| return | ||
|
|
||
| try: | ||
| days = int(archive_after_days) | ||
| except ValueError: | ||
| add_error( | ||
| result, | ||
| "INVALID_ARCHIVE_AFTER_DAYS", | ||
| f"archive_after_days must be a positive integer: {archive_after_days}", | ||
| ) | ||
| return | ||
|
|
||
| if days <= 0: | ||
| add_error( | ||
| result, | ||
| "INVALID_ARCHIVE_AFTER_DAYS", | ||
| f"archive_after_days must be a positive integer: {archive_after_days}", | ||
| ) | ||
| return | ||
|
|
||
| if archive_policy is None: | ||
| add_format_finding( | ||
| result, | ||
| options, | ||
| "ARCHIVE_AFTER_DAYS_WITHOUT_POLICY", | ||
| "archive_after_days is only meaningful with archive_policy: suggest.", | ||
| ) | ||
| elif archive_policy == "manual": | ||
| add_format_finding( | ||
| result, | ||
| options, | ||
| "ARCHIVE_AFTER_DAYS_WITH_MANUAL_POLICY", | ||
| "archive_after_days is only meaningful with archive_policy: suggest.", | ||
| ) | ||
|
|
There was a problem hiding this comment.
The current implementation of top-level field extraction and validation has several opportunities for improvement to ensure consistency with item-level field validation:
- Duplicate Detection:
top_level_fieldsuses a dictionary comprehension that silently overwrites duplicate keys. It should explicitly detect and report duplicates, similar tofields_for_block. - Unknown Field Detection: It should warn about unknown top-level fields (e.g., typos or unsupported keys) to maintain file integrity.
- Policy Logic: If
archive_policyis set tosuggest,archive_after_daysshould ideally be present to define the threshold. The current logic returns early ifarchive_after_daysis missing, skipping this check. - Robustness: The extraction and validation can be unified to avoid redundant preamble splitting and comment stripping.
def top_level_fields(text: str, result: ValidationResult, options: ValidationOptions) -> dict[str, str]:
text = strip_html_comments(text)
preamble = re.split(r"^##\s+", text, maxsplit=1, flags=re.M)[0]
fields: dict[str, str] = {}
known_fields = set(SKELETON_FIELDS) | {"archive_policy", "archive_after_days"}
for match in TOP_LEVEL_FIELD_RE.finditer(preamble):
key, value = match.group(1), match.group(2).strip()
if key in fields:
add_error(result, "DUPLICATE_TOP_LEVEL_FIELD", f"Duplicate top-level field: {key}")
if key not in known_fields:
add_format_finding(result, options, "UNKNOWN_TOP_LEVEL_FIELD", f"Unknown top-level field: {key}")
fields[key] = value
return fields
def add_format_finding(
result: ValidationResult,
options: ValidationOptions,
code: str,
message: str,
) -> None:
if options.strict_format:
add_error(result, code, message)
else:
add_warning(result, code, message)
def validate_top_level_fields(text: str, result: ValidationResult, options: ValidationOptions) -> None:
fields = top_level_fields(text, result, options)
archive_policy = fields.get("archive_policy")
archive_after_days = fields.get("archive_after_days")
if archive_policy and archive_policy not in VALID_ARCHIVE_POLICIES:
add_error(
result,
"INVALID_ARCHIVE_POLICY",
f"Invalid archive_policy: {archive_policy}. Use manual or suggest.",
)
if archive_policy == "suggest" and archive_after_days is None:
add_format_finding(
result,
options,
"MISSING_ARCHIVE_AFTER_DAYS",
"archive_policy: suggest requires archive_after_days.",
)
if archive_after_days is not None:
try:
days = int(archive_after_days)
if days <= 0:
raise ValueError()
except ValueError:
add_error(
result,
"INVALID_ARCHIVE_AFTER_DAYS",
f"archive_after_days must be a positive integer: {archive_after_days}",
)
else:
if archive_policy != "suggest":
code = "ARCHIVE_AFTER_DAYS_WITHOUT_POLICY" if archive_policy is None else "ARCHIVE_AFTER_DAYS_WITH_MANUAL_POLICY"
add_format_finding(result, options, code, "archive_after_days is only meaningful with archive_policy: suggest.")There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 85be3ead48
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| f"{', '.join(missing_forbidden)}" | ||
| ) | ||
|
|
||
| if "on_duplicate_id" in expected: |
There was a problem hiding this comment.
Require duplicate-ID contracts unconditionally for collision cases
For the new duplicate-ID semantic case, this gate only runs the collision checks when on_duplicate_id is present. If a future edit accidentally removes on_duplicate_id along with must_reread_before_write or must_avoid_existing_ids from duplicate-id-stop-and-report-kr, check_semantic_cases.py will treat it as an ordinary add-item case and still pass, so the eval no longer enforces the concurrency boundary this commit adds. Key the requirement off the collision case/fixture instead of the optional field itself.
Useful? React with 👍 / 👎.
Summary
archive_policy/archive_after_daysvalidation and document manual vs review-time suggested archive behaviorarchive_policy: manual0.3.0Validation
python3 -m unittest discover -s evals -p 'test_*.py'python3 evals/check_semantic_cases.pypython3 evals/check_watchlist.py .watchlist/WATCHLIST.md --strict-format --strict-safety --require-archive-sectionpython3 evals/check_watchlist.py .agents/skills/watchlist-md/assets/WATCHLIST.template.md --strict-format --strict-safety --require-archive-sectionpython3 evals/check_watchlist.py evals/fixtures/*.watchlist.md --strict-format --strict-safety --require-archive-sectionpython3 evals/check_release_metadata.pypython3 evals/check_policy_markers.pypython3 -m py_compile evals/check_watchlist.py evals/check_policy_markers.py evals/check_release_metadata.py evals/check_semantic_cases.py evals/test_check_watchlist.pygit diff --check