From ffe3e9a375f0e731b5d3accc368a303fdab564c5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?PSBigBig=20=C3=97=20MiniPS?= Date: Tue, 3 Mar 2026 17:38:54 +0800 Subject: [PATCH] docs: add RAG failure checklist for LLM-based monitoring Add a focused docs-only checklist for diagnosing retrieval, deduplication, and summary failures in monitoring workflows. Refs #652 --- docs/rag-failure-checklist.md | 150 ++++++++++++++++++++++++++++++++++ 1 file changed, 150 insertions(+) create mode 100644 docs/rag-failure-checklist.md diff --git a/docs/rag-failure-checklist.md b/docs/rag-failure-checklist.md new file mode 100644 index 000000000..10b41ad5a --- /dev/null +++ b/docs/rag-failure-checklist.md @@ -0,0 +1,150 @@ +# RAG Failure Checklist for LLM-based Monitoring + +This page provides a lightweight checklist for debugging suspicious LLM-generated summaries in monitoring workflows. + +When a summary looks wrong, the failure usually comes from one of three stages: + +1. retrieval +2. deduplication +3. summary generation + +The goal is to narrow the failure mode quickly, reduce guesswork, and make follow-up fixes more repeatable. + +## How to use this checklist + +1. Start from the visible symptom, not from assumptions. +2. Decide which stage is most likely responsible: retrieval, deduplication, or summary. +3. Match the symptom to one or more failure patterns below. +4. Run the first check before changing prompts, feeds, or logic. +5. Only then decide whether the fix belongs in source selection, merge logic, or summarization rules. + +## Pipeline view + +### Retrieval + +This stage decides which source items enter the working set. + +Typical failures here: +- the right source never enters the context +- the wrong source is selected +- time ordering is distorted +- related facts are split across boundaries + +### Deduplication + +This stage decides which items are treated as the same event. + +Typical failures here: +- one event is split into several duplicates +- several different events are merged too early +- aliases cause cross-region or cross-organization confusion +- repeated aggregator copies drown out primary reporting + +### Summary generation + +This stage turns the working set into a readable narrative. + +Typical failures here: +- the summary drifts away from the evidence +- unsupported conclusions appear +- ambiguity is hidden instead of surfaced +- analysts cannot easily trace why the conclusion was produced + +## 16 common failure patterns + +| No. | Failure pattern | What it looks like in monitoring | Primary stage | First check | +| --- | --- | --- | --- | --- | +| 1 | Stale source selected | An older article is presented as if it were current breaking news | Retrieval | Verify publication time, recency sorting, and update timestamps | +| 2 | Relevant source missed | A key actor, location, or development is missing from the summary | Retrieval | Check source coverage, filters, and fetch scope | +| 3 | Chunk boundary split | Cause and consequence appear disconnected or incomplete | Retrieval | Review chunking boundaries and context window size | +| 4 | Temporal context lost | Event order is reversed or escalation timing is misread | Retrieval | Compare timestamps across all contributing items | +| 5 | Entity lookup mismatch | Similar places, agencies, or organizations are confused at intake | Retrieval | Check alias normalization and entity matching rules | +| 6 | Source weighting imbalance | One noisy or repetitive source dominates the result | Retrieval | Review enabled feeds and per-source frequency | +| 7 | Same event split into duplicates | One incident appears multiple times as separate events | Deduplication | Compare titles, URLs, timestamps, and near-duplicate clusters | +| 8 | Different events merged together | Separate incidents are collapsed into one story | Deduplication | Inspect merge threshold and clustering keys | +| 9 | Regional alias collision | Similar country, city, or region names are merged incorrectly | Deduplication | Compare geographic tags and alias handling | +| 10 | Organization name collision | Similar institutions or companies are treated as the same entity | Deduplication | Check canonical naming and entity resolution | +| 11 | Conflicting reports collapsed too early | Uncertainty disappears before the evidence is settled | Deduplication | Preserve conflicting sources until confidence improves | +| 12 | Repeated aggregator dominance | Syndicated copies overpower the original reporting | Deduplication | Separate wire copies and aggregator mirrors from primary sources | +| 13 | Long-summary drift | The narrative gradually moves beyond what the sources support | Summary | Compare each sentence against the source set | +| 14 | Unsupported conclusion | The summary adds claims not directly supported by evidence | Summary | Mark claims that cannot be traced to source text | +| 15 | Ambiguity hidden | Unclear actors, timing, or locations are presented too confidently | Summary | Surface uncertainty explicitly instead of forcing resolution | +| 16 | Missing audit trail | It is hard to explain why the final conclusion was produced | Summary | Keep source attribution and an inspectable reasoning path | + +## Quick examples + +### Example 1: an old article looks like breaking news + +If a summary treats an older article as a new escalation, the first suspects are: + +- No. 1 Stale source selected +- No. 4 Temporal context lost + +Start by checking publication time, sort order, and whether newer articles were available but excluded. + +### Example 2: similar events from different countries are mixed together + +If reports from different regions are collapsed into one event, the first suspects are: + +- No. 8 Different events merged together +- No. 9 Regional alias collision +- No. 10 Organization name collision + +Start by reviewing merge logic, geo labels, and canonical entity handling. + +### Example 3: a long brief becomes more confident than the evidence + +If a multi-source summary slowly becomes more speculative than the underlying inputs, the first suspects are: + +- No. 13 Long-summary drift +- No. 14 Unsupported conclusion +- No. 15 Ambiguity hidden +- No. 16 Missing audit trail + +Start by tracing each conclusion back to the source set and marking where uncertainty was dropped. + +## Minimal triage flow + +When a summary looks suspicious: + +1. Ask whether the problem started before summarization. +2. If the wrong inputs entered the context, focus on retrieval. +3. If the right inputs entered but were merged incorrectly, focus on deduplication. +4. If the inputs look correct but the narrative is wrong, focus on summary generation. +5. Record the failure pattern before making changes, so future incidents can be compared consistently. + +## What to adjust after diagnosis + +### If retrieval is the main problem + +Review: +- source selection +- recency rules +- chunking boundaries +- alias normalization +- feed weighting + +### If deduplication is the main problem + +Review: +- duplicate clustering thresholds +- canonical entity mapping +- geo matching rules +- handling of conflicting reports +- treatment of aggregator copies + +### If summary generation is the main problem + +Review: +- summary length +- claim-to-source traceability +- uncertainty handling +- attribution visibility +- constraints that prevent unsupported synthesis + +## Scope of this page + +This is a diagnostic aid, not a replacement for product logic. + +It does not change retrieval, deduplication, or summarization behavior by itself. +It provides a shared vocabulary for investigating suspicious outputs in a more repeatable way.