Skip to content

Project-review hardening: port seams, dispatcher report, serialization round-trip, perf, CI gates#191

Merged
duncanita merged 7 commits into
mainfrom
fix/full-review-findings
Jun 10, 2026
Merged

Project-review hardening: port seams, dispatcher report, serialization round-trip, perf, CI gates#191
duncanita merged 7 commits into
mainfrom
fix/full-review-findings

Conversation

@duncanita

Copy link
Copy Markdown
Owner

Summary

Hardening pass implementing the findings of the full project review (design, pillar enforcement, DRY, performance). No pillar violations were found in shipped code; the fixes target the optional-extension seams of the storage port, enforcement gaps in the cops/CI, duplication, and two measured performance hotspots.

Design

  • Ports::EffectLedger split out of Ports::Storage (which includes it). The Dispatcher now depends only on the ledger surface plus append_event. complete_effect_succeeded / complete_effect_failed are the canonical completion path, with composed (non crash-atomic, documented) port defaults; thread_safe_for_dispatch? is a documented port method defaulting to false.
  • Ports::Storage.method_overridden? removed. Method#owner reflection broke under decorators/proxies; extension behavior now lives in overridable port defaults.
  • Silently-wrong committed-results fallback fixed. The port default for list_committed_results_for_predecessors raises StaleStateError when a predecessor is :committed with no committed attempt, instead of dropping its context_patch and letting downstream nodes run with wrong context.
  • Dispatcher#tick no longer discards completed work on abort: raises DAG::Effects::DispatchAbortedError carrying the partial DispatchReport (#report) and the original exception (#cause), only after all workers join. Non-StandardError exceptions propagate unwrapped.
  • tick(only_workflow_id:) forwards the V1.4 per-workflow claim filter.
  • Durable :workflow_retrying event appended atomically inside prepare_workflow_retry, so the event log explains the workflow_failed → node_started sequence a retry produces.
  • Full-fidelity to_h/from_h round-trip for Success/Failure/Waiting/Intent/ProposedMutation/ReplacementGraph/Graph, with DAG::Result.from_h as the single deserialization entry point (Symbol or String keys) — closes the serialization trap ahead of the SQLite adapter.
  • New DuplicateWorkflowError / UnknownAttemptError under DAG::Error; single-runner invariant on the crash-resume path documented on the port and in CONTRACT.md; retriable-Failure immediate-retry semantics documented as deliberate.

DRY

  • DAG::AttemptOrder — single executable definition of the canonical committed-attempt ordering (was triplicated across Runner, Memory adapter, NodeDiagnostic).
  • DAG::EventPublishing.publish_quietly, StorageState helpers (assert_state! ×6, optional-event validate/append ×3, leased_effect! ×3), Validation.optional_node_id!, removal of Effects.frozen_copy_or_nil (weaker near-duplicate of DAG.frozen_copy).
  • One rescued-exception vocabulary: error_class: everywhere; :handler_bad_return uses returned_class:.

Performance (measured)

  • ExecutionContext#merge validates/copies only the patch (canonical key collisions still rejected): 5,000 small merges onto a 200-key context drop from ~5.5 s to ~17 ms.
  • Memory::StorageState: per-node attempt index and active-effect list remove the O(all-attempts) scan per node execution and the O(total-effects-ever) scan per dispatcher tick.
  • Runner threads the stamped terminal event seq into RunResult instead of re-reading the event log.

Enforcement / CI / hygiene

  • Cops: Fiber banned everywhere including the dispatcher carve-out, Kernel.system/spawn/fork flagged, stdlib require allowlist trimmed to the frozen §3 list, mutation_service.rb added to the in-place-mutation cop scope (exclusions documented in the cop).
  • CI consolidated to a single workflow (Ruby 3.4 / 4.0 / head) running with COVERAGE=1, so the SimpleCov gate (100% line / 90% branch) is actually enforced; the suite was brought to 100% line / 91.97% branch to pass it.
  • Executable Roadmap §9.1 grep gate (spec/r0/kernel_ai_terms_test.rb); root-level review files moved to docs/reviews/; .sentrux/ gitignored.
  • CONTRACT.md / CLAUDE.md / CHANGELOG aligned with all of the above (CLAUDE.md port-extensions list had fallen behind CONTRACT.md).

Deliberately deferred (need their own design pass)

Handler lease-renewal context (#call(record, ctx) shape), workflow-level owner/lease claim for multi-host resume, extraction of kernel invariants from StorageState into shared validators, extending Mutant subjects to the imperative core.

Test plan

  • bundle exec rake (test + standardrb + rubocop + YARD ≥99%) — green, exit 0
  • COVERAGE=1 bundle exec rake test — 762 runs, 0 failures; 100% line / 91.97% branch
  • A/B benchmarks for the context-merge and dispatcher-claim hotspots

🤖 Generated with Claude Code

duncanita and others added 7 commits June 10, 2026 12:44
Review findings (DRY + design):
- DAG::AttemptOrder: single executable definition of the canonical
  committed-attempt ordering, previously triplicated across Runner,
  Memory::StorageState, and NodeDiagnostic
- DAG::EventPublishing.publish_quietly: shared best-effort event-bus
  publication (Runner + MutationService)
- StorageState: assert_state! CAS guard (6 sites), optional-event
  validate/append helpers (3 sites), leased_effect! preamble (3 sites)
- Validation.optional_node_id! and use optional_string! at hand-rolled
  'X! unless nil?' sites
- Remove Effects.frozen_copy_or_nil (weaker near-duplicate of
  DAG.frozen_copy that trusted shallow-frozen containers)
- One exception-error-hash vocabulary: error_class: everywhere
  (idempotency conflicts now built via Result.exception_failure;
  dispatcher handler_raised uses error_class:, handler_bad_return uses
  returned_class:)
- New DuplicateWorkflowError and UnknownAttemptError under DAG::Error
  (previously bare ArgumentError, invisible to rescue DAG::Error)
- MutationService guards documented as pre-checks that must stay in
  sync with the authoritative storage-layer guards

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…efaults

Design findings from the project review:

- New Ports::EffectLedger module owns the effect-ledger surface
  (listing, claim, mark, renew, release, complete). Ports::Storage
  includes it, so existing adapters are unchanged; the Dispatcher now
  depends only on the ledger subset plus append_event (interface
  segregation - the Runner and the Dispatcher had disjoint method sets
  inside one 27-method port).

- complete_effect_succeeded/complete_effect_failed are now the
  canonical completion path with default implementations composing
  mark_effect_* + release_nodes_satisfied_by_effect. The dispatcher no
  longer chooses between two completion APIs, and the crash-window
  caveat of the composed path is documented on the port instead of
  living silently inside the dispatcher.

- list_committed_results_for_predecessors gets a default composed from
  load_node_states + list_attempts that RAISES StaleStateError when a
  predecessor is :committed with no committed attempt, instead of the
  old Runner fallback that silently dropped the carried-forward
  context_patch and let downstream nodes run with wrong context.

- Ports::Storage.method_overridden? is gone: reflection on
  Method#owner broke under decorators/proxies (instrumentation
  wrappers define every method themselves). With port-level defaults
  there is nothing left to detect; thread_safe_for_dispatch? is now a
  documented port method defaulting to false.

- Document the single-runner invariant on the crash-resume path
  (acquire of an already-:running row performs no storage write).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Dispatcher#tick(limit:, only_workflow_id: nil) forwards the V1.4
  per-workflow claim filter, so consumers no longer have to bypass the
  Dispatcher to use claim_ready_effects(only_workflow_id:)
- When a worker raises, tick no longer discards the outcomes sibling
  workers already durably applied: it raises
  DAG::Effects::DispatchAbortedError carrying the partial
  DispatchReport, with the original exception as #cause.
  Non-StandardError exceptions still propagate unwrapped after all
  workers have joined
- Runner#retry_workflow appends a durable :workflow_retrying event in
  the same atomic step as the retry transition (the event: kwarg
  prepare_workflow_retry already supported), so the event log explains
  the workflow_failed -> node_started sequence a retry produces.
  New Event::TYPES entry and TraceRecord status :retrying
- Document the deliberate no-backoff semantics of retriable Failure
  (immediate re-execution; delayed retries belong to Waiting + effects)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Performance findings from the project review:

- ExecutionContext#merge no longer re-validates and re-deep-copies the
  entire context per predecessor per attempt: only the incoming patch
  is validated/copied, the already-deep-frozen data is reused through a
  private trusted constructor. Canonical Symbol/String key collisions
  with existing keys are still rejected (per-instance canonical key
  set). Measured: 5000 small merges onto a 200-key context drop from
  ~5.5s to ~17ms
- Memory::StorageState gains a per-node attempt index
  ({[workflow_id, revision, node_id] => [attempt_id,...]}) used by
  count_attempts, committed-result lookup, and prepare_workflow_retry,
  removing the O(all-attempts-in-workflow) scan per node execution
- claim_ready_effects iterates an active (non-terminal) effect list
  instead of the full ledger, so a long-lived dispatcher poller no
  longer pays O(total effects ever created) per tick; terminal marks
  retire ids from the active list
- Runner threads the stamped terminal event seq into RunResult instead
  of re-reading the whole event log on the terminal paths
- json_safe_walk! inlines the JSON key class check on the hot path

Hardening for the (now CI-enforced) coverage gate: new
spec/r3/review_hardening_test.rb covers AttemptOrder, EventPublishing,
optional Validation helpers, the trusted merge invariants, legacy
snapshot index rebuilds, effect-coordinate guards, the two-hop
committed-result projection carry-forward, and the new cop branches;
storage-contract consumer-boundary check rewritten without the
unreachable offender branch.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Review finding: Success#to_h silently dropped context_patch (which the
Runner requires for replay) and Failure#to_h dropped retriable (load-
bearing for retry semantics); Waiting had no to_h at all and no from_h
existed anywhere - three sibling types, three serialization stories,
and a trap for the upcoming durable SQLite adapter.

- Success/Failure/Waiting#to_h now project every field, JSON-safe
- DAG::Result.from_h is the single deserialization entry point,
  dispatching on :status to Success/Failure/Waiting.from_h
- Intent, ProposedMutation, ReplacementGraph gain to_h/from_h;
  Graph.from_h rebuilds a frozen graph from the canonical Graph#to_h
- All from_h constructors accept Symbol or String keys via the new
  DAG::Snapshot indifferent-fetch helper, so a round-trip through the
  JSON serializer reconstructs the original value (verified in
  spec/r3/result_serialization_test.rb)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- CONTRACT.md: EffectLedger split and port defaults, canonical
  complete_effect_* path, DispatchAbortedError semantics,
  tick(only_workflow_id:), :workflow_retrying event, full-fidelity
  to_h/from_h contract, error_class vocabulary, new storage errors,
  single-runner invariant on the crash-resume path, dispatcher
  construction snippet shows parallelism:
- CLAUDE.md: port-extensions section rewritten for the module split and
  the defaults (it had fallen behind CONTRACT.md), key-files list
  completed (mutation/diagnostics layer, toolkit, testing contract, new
  files), spec layout includes spec/r3/, error list and :step_raised
  payload shape updated, COVERAGE gate documented
- CHANGELOG: Unreleased section for the whole hardening pass
- New executable Roadmap §9.1 gate (spec/r0/kernel_ai_terms_test.rb);
  NodeDiagnostic doc comment reworded to stay clear of the banned terms

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- spec/r3/review_hardening_test.rb: don't mix hash styles
- Memory::Storage effect methods pointed their (see ...) doc references
  at Ports::Storage methods that moved to Ports::EffectLedger, dropping
  YARD below its 99% gate; refs updated and the immutability trio
  (deep_freeze / deep_dup / json_safe!) documented

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@duncanita duncanita mentioned this pull request Jun 10, 2026
10 tasks
@duncanita duncanita merged commit 071708a into main Jun 10, 2026
4 of 6 checks passed
@duncanita duncanita deleted the fix/full-review-findings branch June 10, 2026 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant