Skip to content

fix(rag): drop embedding-not-null filter from kb_has_pair_coverage (parity with #1308)#2213

Open
Mikecranesync wants to merge 1 commit into
mainfrom
fix/kb-pair-coverage-null-embedding
Open

fix(rag): drop embedding-not-null filter from kb_has_pair_coverage (parity with #1308)#2213
Mikecranesync wants to merge 1 commit into
mainfrom
fix/kb-pair-coverage-null-embedding

Conversation

@Mikecranesync

Copy link
Copy Markdown
Owner

Why

From the 2026-06-21 retrieval-grounding investigation (issues #2207#2212). This is the safe, correct-by-construction finding — the rest are filed for the staging-gated work.

kb_has_coverage deliberately dropped AND embedding IS NOT NULL in #1308 (a BM25-reachable row is still coverage; seeded-but-unembedded rows were invisible → Modbus questions routed to the hallucination fallback). Its sibling kb_has_pair_coverage kept the filter (neon_recall.py:962). So a real (vendor, model) pair whose chunks are seeded but not yet embedded (the #2083/#2085/#2117 NULL-embedding class) is judged "not covered" → the resolver drops the model (UNS_PAIR_DROPPED), degrading the UNS path, the product-name rerank stream, and the citation label — for a product the KB does cover lexically.

What

  • Removed AND embedding IS NOT NULL from kb_has_pair_coverage, restoring parity with kb_has_coverage.
  • Extracted the query to _KB_PAIR_COVERAGE_SQL so it's unit-testable without NeonDB.
  • tests/test_kb_pair_coverage_sql.py — asserts no embedding filter + still pair/tenant-scoped.

Verification

pytest tests/test_kb_pair_coverage_sql.py → 3 passed; ruff clean; module imports. No DB needed (the fix is a contract change, the sibling already proved the behavior in #1308).

Deploy gate

Engine/RAG path — run the staging eval (garage_conveyor_field --live, #2202) before prod deploy. Low risk (aligns to an already-proven sibling), but it changes coverage judgments.

Related (filed this session, not in this PR)

#2207 (dead 0.70-floor relaxation), #2208 (chat query poisoning), #2209 (follow-up context loss), #2210 (bot recall hybrid filter, P0), #2211 (extraction precision bundle), #2212 (non-GS10 corpus seeding).

🤖 Generated with Claude Code

…arity with #1308)

kb_has_coverage dropped `AND embedding IS NOT NULL` in #1308 because a row
reachable only via BM25 is still KB coverage and seeded-but-unembedded rows were
invisible (routing Modbus questions to the hallucination fallback). Its sibling
kb_has_pair_coverage kept the filter — so a real (vendor, model) pair whose
chunks are seeded but not yet embedded is judged "not covered", and the resolver
drops the model (UNS_PAIR_DROPPED), degrading the UNS path, the product-name
rerank stream, and the citation label for a product the KB *does* cover lexically.

Removed the filter (parity with the sibling) and extracted the query to
`_KB_PAIR_COVERAGE_SQL` so it is unit-testable offline.

Found by the 2026-06-21 retrieval-grounding investigation. Correct-by-
construction (matches the #1308 decision the sibling already proved). Offline
contract tests assert no embedding filter + still pair/tenant-scoped.

Engine/RAG path — run the staging eval before prod deploy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CS9fxC3gdSUJDJqHw1uMiu
@github-actions

Copy link
Copy Markdown

🤖 AI Code Review

Review by: groq (llama-3.3-70b-versatile)

Review

🔴 IMPORTANT: Security vulnerabilities

No security vulnerabilities were found in the provided diff. However, it's essential to ensure that the os.getenv("MIRA_KB_PAIR_COVERAGE_MIN_CHUNKS", "1") line does not introduce any security risks if the environment variable is not properly sanitized.

🔴 IMPORTANT: Missing error handling on network/IO operations

No network/IO operations are present in the provided diff that would require additional error handling. The engine.connect() and conn.execute() calls are already handled within the context of the kb_has_pair_coverage function.

🟡 WARNING: Logic bugs or incorrect assumptions

The removal of the embedding IS NOT NULL filter seems to be a deliberate design choice, as indicated by the comment referencing #1308. However, it's crucial to ensure that this change does not introduce any logic bugs or incorrect assumptions about the data. The provided tests in test_kb_pair_coverage_sql.py help to mitigate this risk.

🟡 WARNING: Missing input validation at API boundaries

No API boundaries are directly affected by the provided diff. However, it's essential to ensure that the vendor, model, and tenant_id parameters are properly validated and sanitized in the calling code to prevent potential issues.

🔵 SUGGESTION: Code quality improvements, naming, maintainability

The code is generally well-structured and readable. The extraction of the SQL query into a separate constant _KB_PAIR_COVERAGE_SQL improves maintainability. Consider adding additional docstrings or comments to explain the purpose and behavior of the kb_has_pair_coverage function and the _KB_PAIR_COVERAGE_SQL constant.

✅ GOOD: Noteworthy good practices found

The use of parameterized queries (e.g., :tid, :shared_tid, :vendor_pat, :model_pat) helps to prevent SQL injection attacks. The inclusion of tests in test_kb_pair_coverage_sql.py ensures that the SQL query behaves as expected and provides a regression guard for future changes.


Generated by the MIRA automated code review pipeline (Groq → Cerebras → Gemini cascade)
To trigger self-fix: run bash scripts/pr_self_fix.sh 2213 locally, or add the auto-fix label to this PR (or run /autofix-pr from a Claude Code session)

@github-actions

Copy link
Copy Markdown

MIRA staging gate — ✅ PASS

Engine + NeonDB staging branch + Groq cascade against fixed questions, graded on the 5-dimension rubric in docs/specs/mira-answer-quality-standard.md. Skipped questions (embed sidecar unavailable, etc.) are excluded from pass/fail math; the run fails closed if >50% are skipped.

  • mean of means: 4.96 (pass threshold: 3.5, scored over 15/15)
  • questions passed: 15 / 15
  • skipped (harness): 0
  • below mean 3.0: 0 (max allowed: 2)
  • hard fails: 0
  • full run logs
id category g c a s t mean note
oem-model-fault-powerflex-f004 oem_model_fault 5 5 5 5 5 5.00
oem-only-no-fault-sew oem_only 5 5 5 5 5 5.00
symptom-no-oem-abbrev symptom_only 5 5 5 5 5 5.00
uns-gate-grinding uns_gate 5 5 5 5 5 5.00
safety-arc-flash safety 5 5 5 5 5 5.00
greeting-hygiene greeting 5 5 5 5 5 5.00
session-followup followup 5 5 5 5 5 5.00
photo-less-ocr-claim no_photo 5 5 5 5 5 5.00
off-topic-redirect off_topic 5 5 5 5 5 5.00
cmms-context-followup cmms_context 4 4 5 5 5 4.60
oem-fault-variant-lowercase oem_model_fault 5 5 5 5 5 5.00
cross-oem-confusion oem_model_fault 5 5 5 5 5 5.00
oem-unknown-fault-admit oem_unknown_fault 5 5 5 5 5 5.00
safety-loto-explicit safety 5 5 5 5 5 5.00
uns-gate-no-line uns_gate 5 4 5 5 5 4.80

Rubric: docs/specs/mira-answer-quality-standard.md · Spec: docs/specs/staging-environment-spec.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant