Skip to content

Sub-project B PR 1: to_dict() sync + cohort tuple lock-in#4

Closed
sangicook wants to merge 2 commits intofeature/sub-a-determinism-strict-cqsfrom
subproject-b/pr1-to-dict-and-cohort-tuple
Closed

Sub-project B PR 1: to_dict() sync + cohort tuple lock-in#4
sangicook wants to merge 2 commits intofeature/sub-a-determinism-strict-cqsfrom
subproject-b/pr1-to-dict-and-cohort-tuple

Conversation

@sangicook
Copy link
Copy Markdown
Owner

Summary

Sub-project B / PR 1 — two preventive consistency fixes, each with a permanent unit-test lock to catch future drift. No active bug is being patched; both are latent drift fixes called out by the Consensus 024 deep-investigation session.

Stacked PR: Built on top of feature/sub-a-determinism-strict-cqs (which has its own open upstream PR at dgunning#764). When that branch merges, this PR's base can be re-targeted to main.

  • fix(ledger): ExtractionRun.to_dict() had drifted out of sync with its dataclass — 10 provenance fields (concept, accession_number, statement_role, period_type, period_start, period_end, unit, decimals, reference_source, publish_confidence) existed on the dataclass but were silently dropped by the serializer. The method has zero callers in the standardization subsystem today (the production write path record_run() at schema.py:586-608 uses direct attribute access), so the bug is latent. New test uses dataclasses.fields() reflection + a TRANSIENT_FIELDS allowlist to lock the invariant against future drift.
  • refactor(auto-eval): EXPANSION_COHORT_50/100/500 were mutable lists; tests or overnight loops calling .append() could silently corrupt every downstream run depending on stable cohort membership. Convert all three to tuples, matching the DETERMINISM_TEST_COHORT precedent already established by Sub-project A. New test locks the type AND the chain ordering invariant via slice equality (EXPANSION_COHORT_100[:50] == EXPANSION_COHORT_50, etc.).

Both changes are scoped, independent, and bisectable. PR 1 has no external blockers (other than the upstream sub-a merge) and unblocks the planned PR 3 (provenance hashes) by establishing a consistent serializer path.

Plan & Context

  • Plan: docs/superpowers/plans/2026-04-07-subproject-b-pr1-to-dict-and-cohort-tuple.md (lives on local main)
  • Parent decision: docs/autonomous-system/consensus/024-2026-04-07-sub-project-b-scope.md — note the correction footer reframing this PR as preventive consistency work, not an active bug fix
  • Sequence: PR 1 of 3 in the Sub-project B chain. PR 2 is blocked on a prerequisite write-path investigation; PR 3 is blocked on PR 1 landing (it depends on this PR's to_dict() fix to have a consistent serializer path for new hash fields).
  • Depends on: Sub-project A determinism work in feature/sub-a-determinism-strict-cqs (upstream PR Sub-project A: strict EF-CQS + determinism CI gate dgunning/edgartools#764)

Diff Stats

File Change
edgar/xbrl/standardization/ledger/schema.py +32 / −9 (to_dict() body + drift-prevention docstring)
edgar/xbrl/standardization/tools/auto_eval.py +20 / −8 (three coordinated [( conversions + comment blocks)
tests/xbrl/standardization/test_extraction_run_serialization.py +115 (new — 3 tests)
tests/xbrl/standardization/test_cohort_immutability.py +104 (new — 5 tests)

Total: +258 / −17, 4 files, 2 commits.

Test Plan

  • pytest -m fast tests/xbrl/standardization/test_extraction_run_serialization.py -v — 3 tests pass (drift lock, no-unexpected-keys, provenance round-trip)
  • pytest -m fast tests/xbrl/standardization/test_cohort_immutability.py -v — 5 tests pass (precondition + 3 type asserts + content regression guard with len(_50)==50, len(_100)==100, len(_500)==500, slice-equality chain checks, and ticker spot-checks)
  • pytest -m fast tests/xbrl/standardization/ — full standardization fast suite, expect 184 passed / 1 pre-existing failure (the failure is TestO15SemanticInstruction::test_candidates_instruction_is_semantic_not_numerical in test_closed_loop.py:1462, a string-match assertion on prompt wording that pre-dates this PR and is unrelated)
  • python -c "from edgar.xbrl.standardization.tools.auto_eval import DETERMINISM_TEST_COHORT, compute_cqs, EXPANSION_COHORT_50, EXPANSION_COHORT_100, EXPANSION_COHORT_500" — determinism gate imports cleanly with new tuple cohorts
  • No test in the repo mutates the expansion cohorts (verified via grep for .append/.extend/.clear/.remove against the cohort names)

Quality Process

Built via superpowers:subagent-driven-development. Each commit went through two-stage review (spec compliance → code quality), with one revision round each:

Task First-pass quality findings Fixes amended
1 (to_dict() sync) non-deterministic test helper; missing run_id-ordering docstring helper now uses explicit run_id/run_timestamp; docstring explains run_id-first ordering
2 (cohort tuples) missing len(_500) == 500 assertion; unused imports 500-length assertion added; QUICK_EVAL_COHORT/VALIDATION_COHORT imports removed

A final whole-PR review across both commits caught one cross-commit consistency issue (pytestmark placement in test_cohort_immutability.py violated PEP 8 E402 and was inconsistent with the Task 1 test file and 8 sibling files). Fixed and re-verified.

🤖 Generated with Claude Code

ychan added 2 commits April 7, 2026 14:20
Add the 10 provenance fields that had drifted out of to_dict():
concept, accession_number, statement_role, period_type, period_start,
period_end, unit, decimals, reference_source, publish_confidence.

This is a latent bug fix: to_dict() has zero callers in the standardization
subsystem today (record_run() at schema.py:556-595 writes via direct field
access), but any future caller would silently lose the fields. Add a
test that locks to_dict() to the dataclass field set so future drift is
caught immediately.

See Consensus 024 for context. Part of Sub-project B PR 1.
Convert the expansion cohort chain from lists to tuples, matching the
DETERMINISM_TEST_COHORT precedent set by Sub-project A. Prevents accidental
mutation by tests or overnight loops that could silently corrupt every
downstream run that depends on a stable cohort membership.

Lock type discipline via new unit tests so future drift is caught at CI.

See Consensus 024 Phase 1 Finding 4. Part of Sub-project B PR 1.
@sangicook
Copy link
Copy Markdown
Owner Author

Closing this PR — work has been merged into local `main` via `git merge --no-ff` (merge commit `0f4906b8`, PR commits `5ba0aba1` + `207dc163`).

This project is pivoting to a standalone product architecture where `edgartools` becomes a pinned dependency rather than a fork target. Public PR ceremony is no longer needed for autonomous-system work. The code is preserved in local main's history with the full 2-commit structure and merge commit intact.

The remote branch is being deleted as part of the close.

@sangicook sangicook closed this Apr 7, 2026
@sangicook sangicook deleted the subproject-b/pr1-to-dict-and-cohort-tuple branch April 7, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant