Status
Active planning epic for the post-NVIDIA provider foundation.
Context
Wiii needs to move from demo-capable to production-reliable quickly. PR #169 establishes the NVIDIA provider foundation and local smoke gate, but reliability still needs explicit runtime health, fallback, streaming finalization, pipeline simplification, and memory/identity contracts.
The goal is not just to make requests succeed. The goal is that Wiii feels continuous, responsive, and alive: short chat should stay fast, long/tool/RAG work should stream honestly, memory should be visible in behavior, and provider failures should degrade gracefully instead of freezing the frontend.
Principles
- Keep PRs narrow and mergeable. Do not combine provider runtime, architecture docs, and memory refactors in one PR.
- Do not delete LangGraph/history/compat code in a sweep. Mark, isolate, and remove by phase with rollback notes.
- Prefer health-based runtime decisions over static provider assumptions.
- Preserve Vietnamese-first user-facing behavior.
- Every high-risk change needs tests, rollback notes, and PR reviewer focus.
PR 1: Provider Runtime Reliability
Scope:
- Model-level health probe: if NVIDIA Flash times out, mark it degraded and stop selecting it until recovery conditions pass.
- Per-provider/per-model timeout profiles for chat, streaming, structured/router, and probe paths.
- Same-provider fallback: Flash -> Pro or Pro -> Flash according to current health and configured policy.
- SSE timeout/finalization guard so the frontend never remains stuck in “đang suy nghĩ” after backend/provider failure.
- Ensure structured/router path cannot slow ordinary short chat.
Acceptance:
- Unit tests cover model degraded state, timeout profile selection, same-provider fallback order, and recovery behavior.
- Streaming tests cover timeout/error finalization and
done/terminal event behavior.
- Local NVIDIA smoke documents which model was selected and why.
- No secrets or
.env* changes are committed.
PR 2: Pipeline Simplification Plan
Scope:
- Document current request lifecycle: request -> auth/org -> memory -> router -> agent -> tool/RAG -> stream.
- Mark active runtime paths vs historical compatibility paths.
- Identify remaining LangGraph/history/compat references and classify them as active, compatibility, test-only, doc-only, or deletion candidate.
- Propose phased LangGraph removal without breaking runtime or rollback.
Acceptance:
- Architecture doc includes lifecycle diagram, ownership, risk, and rollback plan.
- Cleanup list links each LangGraph/history reference to a proposed phase.
- No large runtime deletion in this planning PR unless separately proven safe.
PR 3: Memory/Wiii Identity Reliability
Scope:
- Define the memory contract that prevents failures like “Wiii không nhớ mình”.
- Separate memory namespaces:
persona, human, relationship, goals, craft, and world.
- Make living memory behavior visible in chat responses without dumping raw memory.
- Add tests for memory retrieval/injection, relationship continuity, and fallback behavior when memory is unavailable.
Acceptance:
- Memory contract doc and tests prove Wiii can recall stable identity/relationship facts safely.
- Wiii remains clearly AI, not human-impersonating, while feeling continuous and companion-like.
- Memory writes are selective and interpretable, not raw-turn dumps.
- UI-visible behavior explains memory uncertainty gracefully instead of claiming total amnesia.
Risks
- Provider reliability touches high-risk runtime routing and streaming contracts.
- Memory reliability touches privacy, persistence, and user trust.
- LangGraph cleanup can remove hidden compatibility paths if done too aggressively.
Initial Verification Targets
Backend:
cd maritime-ai-service
set PYTHONIOENCODING=utf-8 && pytest tests/unit/ -p no:capture --tb=short -q
ruff check app/ --select=E9,F63,F7
Desktop/streaming when frontend paths change:
cd wiii-desktop
npx vitest run
npx tsc --noEmit
npm run build:embed
Repository hygiene:
git diff --check
git status --short
Related Work
Status
Active planning epic for the post-NVIDIA provider foundation.
Context
Wiii needs to move from demo-capable to production-reliable quickly. PR #169 establishes the NVIDIA provider foundation and local smoke gate, but reliability still needs explicit runtime health, fallback, streaming finalization, pipeline simplification, and memory/identity contracts.
The goal is not just to make requests succeed. The goal is that Wiii feels continuous, responsive, and alive: short chat should stay fast, long/tool/RAG work should stream honestly, memory should be visible in behavior, and provider failures should degrade gracefully instead of freezing the frontend.
Principles
PR 1: Provider Runtime Reliability
Scope:
Acceptance:
done/terminal event behavior..env*changes are committed.PR 2: Pipeline Simplification Plan
Scope:
Acceptance:
PR 3: Memory/Wiii Identity Reliability
Scope:
persona,human,relationship,goals,craft, andworld.Acceptance:
Risks
Initial Verification Targets
Backend:
Desktop/streaming when frontend paths change:
cd wiii-desktop npx vitest run npx tsc --noEmit npm run build:embedRepository hygiene:
Related Work