Skip to content

docs: TDD rationalizing multimodal RAG + cluster-deploy work (refs #182)#183

Open
stevei101 wants to merge 5 commits into
developfrom
docs/tdd-multimodal-and-service-2026-06-02
Open

docs: TDD rationalizing multimodal RAG + cluster-deploy work (refs #182)#183
stevei101 wants to merge 5 commits into
developfrom
docs/tdd-multimodal-and-service-2026-06-02

Conversation

@stevei101
Copy link
Copy Markdown
Contributor

Summary

Why

The open queue grew into a flat list of ~20 issues with overlapping acceptance criteria and several broken dependency chains (impls filed before their traits; epic #145 children block each other in undocumented orders). This TDD reframes them so each phase is one PR landable against develop, and calls out where issues are already partially resolved (e.g. #178 → PR #179).

Audited 2026-06-02:

Companion issue

Meta-epic #182 lists the full sequenced user stories tracked by this TDD.

Test plan

  • markdownlint clean locally
  • Reviewers confirm phase order makes sense given current sprint plans
  • Reviewers confirm the three open questions in section 8 don't need pre-merge resolution

🤖 Generated with Claude Code

principle-lgtm and others added 5 commits June 2, 2026 18:42
Synthesizes the 20 open issues (epics #145, #171 + bugfixes #167, #178)
into three epics and 12 sequenced phases against the current repo state.

Audited as of 2026-06-02:
- #178 already mostly fixed by PR #179; remainder is consolidating two
  duplicate RagRunRecorder types in graphrag-aivcs
- #167 still present at huggingface.rs:198-204 (silent zero-vector return)
- None of the #145 child types exist yet — green-field
- No deploy/ or infra/ dir yet; flake.nix has no dockerImage output

Replaces the per-issue "100% test coverage" claim with a happy-path +
error-path bar that matches the team's actual cadence. Defers cross-repo
work (crossplane-heaven#10, dockworker.ai#6, lornu.ai#2347) explicitly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`UpdateMonitor::complete_operation` held the `operations_log`
`parking_lot::Mutex` guard while calling `update_performance_stats`,
which re-acquires the same mutex. parking_lot mutexes are not
reentrant, so the second lock blocked the thread forever.

Because the lock is a synchronous (blocking) lock, on the
`#[tokio::test]` current-thread runtime it wedged the single worker
thread, hanging the whole test rather than just the one task. This is
what caused `nix flake check`'s nextest run to spin for 6h and hit the
GitHub Actions max-execution timeout. Affected tests:

- graph::incremental::tests::test_basic_entity_upsert
- graph::incremental::tests::test_production_graph_store_entity_upsert
- graph::incremental::tests::test_production_graph_store_relationship_upsert
- graph::incremental::tests::test_production_graph_store_event_publishing

Scope the lock guard so it is dropped before `update_performance_stats`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Make workspace flake check compile-only (doCheck=false); nextest owns tests
- Set nix-check job timeout to 120 minutes
- Drop duplicate cargo test --workspace after flake check (already in checks.tests)
- Merge develop (deadlock fix, benches doCheck, embedding fixes)

Co-authored-by: Cursor <cursoragent@cursor.com>
@stevei101
Copy link
Copy Markdown
Contributor Author

CI fix pushed (d88b704)

Root cause: nix-check was failing in two ways:

  1. Timeout (~55–60m)nix flake check builds workspace + clippy + fmt + nextest + benches + doc in parallel; job was killed during checks.tests (nextest).
  2. Earlier run: checks.workspace ran `cargo test` via default buildPackage doCheck (17 lib test failures + duplicate work with nextest).

Changes:

  • Merged develop (deadlock fix e590769, benches doCheck, embedding fixes)
  • workspace check: doCheck = false (compile gate; tests only in checks.tests / nextest)
  • Job timeout-minutes: 120
  • Removed duplicate cargo test --workspace step (already covered by flake checks.tests)

Re-running CI now.

Copy link
Copy Markdown
Contributor

@principle-lgtm principle-lgtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs work before approval.

The main blocker is a CI coverage regression:

  • P2: The PR removes the explicit cargo test --workspace step, disables the workspace buildPackage test phase with doCheck = false, and relies on checks.tests / cargoNextest during nix flake check. cargo-nextest does not run doctests; nextest documents doctests as needing a separate cargo test --doc path: https://nexte.st/docs/running/. That means doctest coverage can silently drop out of CI, which conflicts with this TDD's own future test bar for trait/type PRs. Please add an explicit doctest check back into CI/Nix, or otherwise show where doctests are still run.

Smaller cleanup items:

  • P3: The TDD says "eight phases" in the intro and section 5, but the table defines phases 0 through 12.
  • P3: The doc still says Tracking issue: TBD, while the PR is tied to meta-epic #182.

Current checks: latest CI run for d88b704 is still in progress at nix-check / Flake check, with later incremental and SurrealDB steps pending.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants