Project-review hardening: port seams, dispatcher report, serialization round-trip, perf, CI gates#191
Merged
Merged
Conversation
Review findings (DRY + design): - DAG::AttemptOrder: single executable definition of the canonical committed-attempt ordering, previously triplicated across Runner, Memory::StorageState, and NodeDiagnostic - DAG::EventPublishing.publish_quietly: shared best-effort event-bus publication (Runner + MutationService) - StorageState: assert_state! CAS guard (6 sites), optional-event validate/append helpers (3 sites), leased_effect! preamble (3 sites) - Validation.optional_node_id! and use optional_string! at hand-rolled 'X! unless nil?' sites - Remove Effects.frozen_copy_or_nil (weaker near-duplicate of DAG.frozen_copy that trusted shallow-frozen containers) - One exception-error-hash vocabulary: error_class: everywhere (idempotency conflicts now built via Result.exception_failure; dispatcher handler_raised uses error_class:, handler_bad_return uses returned_class:) - New DuplicateWorkflowError and UnknownAttemptError under DAG::Error (previously bare ArgumentError, invisible to rescue DAG::Error) - MutationService guards documented as pre-checks that must stay in sync with the authoritative storage-layer guards Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…efaults Design findings from the project review: - New Ports::EffectLedger module owns the effect-ledger surface (listing, claim, mark, renew, release, complete). Ports::Storage includes it, so existing adapters are unchanged; the Dispatcher now depends only on the ledger subset plus append_event (interface segregation - the Runner and the Dispatcher had disjoint method sets inside one 27-method port). - complete_effect_succeeded/complete_effect_failed are now the canonical completion path with default implementations composing mark_effect_* + release_nodes_satisfied_by_effect. The dispatcher no longer chooses between two completion APIs, and the crash-window caveat of the composed path is documented on the port instead of living silently inside the dispatcher. - list_committed_results_for_predecessors gets a default composed from load_node_states + list_attempts that RAISES StaleStateError when a predecessor is :committed with no committed attempt, instead of the old Runner fallback that silently dropped the carried-forward context_patch and let downstream nodes run with wrong context. - Ports::Storage.method_overridden? is gone: reflection on Method#owner broke under decorators/proxies (instrumentation wrappers define every method themselves). With port-level defaults there is nothing left to detect; thread_safe_for_dispatch? is now a documented port method defaulting to false. - Document the single-runner invariant on the crash-resume path (acquire of an already-:running row performs no storage write). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Dispatcher#tick(limit:, only_workflow_id: nil) forwards the V1.4 per-workflow claim filter, so consumers no longer have to bypass the Dispatcher to use claim_ready_effects(only_workflow_id:) - When a worker raises, tick no longer discards the outcomes sibling workers already durably applied: it raises DAG::Effects::DispatchAbortedError carrying the partial DispatchReport, with the original exception as #cause. Non-StandardError exceptions still propagate unwrapped after all workers have joined - Runner#retry_workflow appends a durable :workflow_retrying event in the same atomic step as the retry transition (the event: kwarg prepare_workflow_retry already supported), so the event log explains the workflow_failed -> node_started sequence a retry produces. New Event::TYPES entry and TraceRecord status :retrying - Document the deliberate no-backoff semantics of retriable Failure (immediate re-execution; delayed retries belong to Waiting + effects) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Performance findings from the project review:
- ExecutionContext#merge no longer re-validates and re-deep-copies the
entire context per predecessor per attempt: only the incoming patch
is validated/copied, the already-deep-frozen data is reused through a
private trusted constructor. Canonical Symbol/String key collisions
with existing keys are still rejected (per-instance canonical key
set). Measured: 5000 small merges onto a 200-key context drop from
~5.5s to ~17ms
- Memory::StorageState gains a per-node attempt index
({[workflow_id, revision, node_id] => [attempt_id,...]}) used by
count_attempts, committed-result lookup, and prepare_workflow_retry,
removing the O(all-attempts-in-workflow) scan per node execution
- claim_ready_effects iterates an active (non-terminal) effect list
instead of the full ledger, so a long-lived dispatcher poller no
longer pays O(total effects ever created) per tick; terminal marks
retire ids from the active list
- Runner threads the stamped terminal event seq into RunResult instead
of re-reading the whole event log on the terminal paths
- json_safe_walk! inlines the JSON key class check on the hot path
Hardening for the (now CI-enforced) coverage gate: new
spec/r3/review_hardening_test.rb covers AttemptOrder, EventPublishing,
optional Validation helpers, the trusted merge invariants, legacy
snapshot index rebuilds, effect-coordinate guards, the two-hop
committed-result projection carry-forward, and the new cop branches;
storage-contract consumer-boundary check rewritten without the
unreachable offender branch.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review finding: Success#to_h silently dropped context_patch (which the Runner requires for replay) and Failure#to_h dropped retriable (load- bearing for retry semantics); Waiting had no to_h at all and no from_h existed anywhere - three sibling types, three serialization stories, and a trap for the upcoming durable SQLite adapter. - Success/Failure/Waiting#to_h now project every field, JSON-safe - DAG::Result.from_h is the single deserialization entry point, dispatching on :status to Success/Failure/Waiting.from_h - Intent, ProposedMutation, ReplacementGraph gain to_h/from_h; Graph.from_h rebuilds a frozen graph from the canonical Graph#to_h - All from_h constructors accept Symbol or String keys via the new DAG::Snapshot indifferent-fetch helper, so a round-trip through the JSON serializer reconstructs the original value (verified in spec/r3/result_serialization_test.rb) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- CONTRACT.md: EffectLedger split and port defaults, canonical complete_effect_* path, DispatchAbortedError semantics, tick(only_workflow_id:), :workflow_retrying event, full-fidelity to_h/from_h contract, error_class vocabulary, new storage errors, single-runner invariant on the crash-resume path, dispatcher construction snippet shows parallelism: - CLAUDE.md: port-extensions section rewritten for the module split and the defaults (it had fallen behind CONTRACT.md), key-files list completed (mutation/diagnostics layer, toolkit, testing contract, new files), spec layout includes spec/r3/, error list and :step_raised payload shape updated, COVERAGE gate documented - CHANGELOG: Unreleased section for the whole hardening pass - New executable Roadmap §9.1 gate (spec/r0/kernel_ai_terms_test.rb); NodeDiagnostic doc comment reworded to stay clear of the banned terms Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- spec/r3/review_hardening_test.rb: don't mix hash styles - Memory::Storage effect methods pointed their (see ...) doc references at Ports::Storage methods that moved to Ports::EffectLedger, dropping YARD below its 99% gate; refs updated and the immutability trio (deep_freeze / deep_dup / json_safe!) documented Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hardening pass implementing the findings of the full project review (design, pillar enforcement, DRY, performance). No pillar violations were found in shipped code; the fixes target the optional-extension seams of the storage port, enforcement gaps in the cops/CI, duplication, and two measured performance hotspots.
Design
Ports::EffectLedgersplit out ofPorts::Storage(which includes it). The Dispatcher now depends only on the ledger surface plusappend_event.complete_effect_succeeded/complete_effect_failedare the canonical completion path, with composed (non crash-atomic, documented) port defaults;thread_safe_for_dispatch?is a documented port method defaulting tofalse.Ports::Storage.method_overridden?removed.Method#ownerreflection broke under decorators/proxies; extension behavior now lives in overridable port defaults.list_committed_results_for_predecessorsraisesStaleStateErrorwhen a predecessor is:committedwith no committed attempt, instead of dropping itscontext_patchand letting downstream nodes run with wrong context.Dispatcher#tickno longer discards completed work on abort: raisesDAG::Effects::DispatchAbortedErrorcarrying the partialDispatchReport(#report) and the original exception (#cause), only after all workers join. Non-StandardErrorexceptions propagate unwrapped.tick(only_workflow_id:)forwards the V1.4 per-workflow claim filter.:workflow_retryingevent appended atomically insideprepare_workflow_retry, so the event log explains theworkflow_failed → node_startedsequence a retry produces.to_h/from_hround-trip forSuccess/Failure/Waiting/Intent/ProposedMutation/ReplacementGraph/Graph, withDAG::Result.from_has the single deserialization entry point (Symbol or String keys) — closes the serialization trap ahead of the SQLite adapter.DuplicateWorkflowError/UnknownAttemptErrorunderDAG::Error; single-runner invariant on the crash-resume path documented on the port and in CONTRACT.md; retriable-Failureimmediate-retry semantics documented as deliberate.DRY
DAG::AttemptOrder— single executable definition of the canonical committed-attempt ordering (was triplicated across Runner, Memory adapter, NodeDiagnostic).DAG::EventPublishing.publish_quietly,StorageStatehelpers (assert_state!×6, optional-event validate/append ×3,leased_effect!×3),Validation.optional_node_id!, removal ofEffects.frozen_copy_or_nil(weaker near-duplicate ofDAG.frozen_copy).error_class:everywhere;:handler_bad_returnusesreturned_class:.Performance (measured)
ExecutionContext#mergevalidates/copies only the patch (canonical key collisions still rejected): 5,000 small merges onto a 200-key context drop from ~5.5 s to ~17 ms.Memory::StorageState: per-node attempt index and active-effect list remove the O(all-attempts) scan per node execution and the O(total-effects-ever) scan per dispatcher tick.RunResultinstead of re-reading the event log.Enforcement / CI / hygiene
Fiberbanned everywhere including the dispatcher carve-out,Kernel.system/spawn/forkflagged, stdlib require allowlist trimmed to the frozen §3 list,mutation_service.rbadded to the in-place-mutation cop scope (exclusions documented in the cop).COVERAGE=1, so the SimpleCov gate (100% line / 90% branch) is actually enforced; the suite was brought to 100% line / 91.97% branch to pass it.spec/r0/kernel_ai_terms_test.rb); root-level review files moved todocs/reviews/;.sentrux/gitignored.Deliberately deferred (need their own design pass)
Handler lease-renewal context (
#call(record, ctx)shape), workflow-level owner/lease claim for multi-host resume, extraction of kernel invariants fromStorageStateinto shared validators, extending Mutant subjects to the imperative core.Test plan
bundle exec rake(test + standardrb + rubocop + YARD ≥99%) — green, exit 0COVERAGE=1 bundle exec rake test— 762 runs, 0 failures; 100% line / 91.97% branch🤖 Generated with Claude Code