Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
336 changes: 336 additions & 0 deletions cookbook/rag_failure_mode_checklist.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,336 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "c9e5d07c",
"metadata": {},
"source": [
"---\n",
"description: Trace-guided RAG failure mode checklist using a shared failure map (Problem Map No.1–No.16)\n",
"---\n",
"\n",
"# Trace-guided RAG failure mode checklist\n",
"\n",
"This notebook shows one way to use your Langfuse traces together with an external failure-mode map.\n",
"\n",
"Workflow:\n",
"\n",
"1. Use Langfuse to find a failing RAG or LLM run.\n",
"2. Copy the prompt, model answer, retrieval context, and any relevant logs into this notebook.\n",
"3. Run the helper cell below to map the bug to a Problem Map number (No.1 to No.16).\n",
"4. Use that number as a tag or metadata field in Langfuse so you can filter and aggregate similar failures over time.\n",
"\n",
"The helper script below uses the open-source **WFGY 16 Problem Map** (MIT licensed) as the vocabulary for these failure modes. You can swap it for your own map if you prefer.\n"
]
},
{
"cell_type": "markdown",
"id": "e5c8e012",
"metadata": {},
"source": [
"## Requirements\n",
"\n",
"You need:\n",
"\n",
"- an OpenAI-compatible chat completion endpoint (OpenAI, Nebius, or any other)\n",
"- an API key\n",
"- internet access so the notebook can download the WFGY Problem Map and TXTOS system prompt from GitHub\n",
"\n",
"This notebook does **not** send anything to Langfuse automatically. It assumes you already use Langfuse for tracing and that you will copy-paste a failing trace into the debugger. The goal is to give you a reproducible checklist and a consistent label such as `No.1`, `No.14`, or `No.16` that you can reuse inside Langfuse.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a089488",
"metadata": {},
"outputs": [],
"source": [
"from getpass import getpass\n",
"import os\n",
"import textwrap\n",
"import requests\n",
"from openai import OpenAI\n",
"\n",
"PROBLEM_MAP_URL = \"https://raw.githubusercontent.com/onestardao/WFGY/main/ProblemMap/README.md\"\n",
"TXTOS_URL = \"https://raw.githubusercontent.com/onestardao/WFGY/main/OS/TXTOS.txt\"\n",
"WFGY_PROBLEM_MAP_HOME = \"https://github.com/onestardao/WFGY/tree/main/ProblemMap#readme\"\n",
"WFGY_REPO = \"https://github.com/onestardao/WFGY\"\n",
"\n",
"EXAMPLE_1 = \"\"\"=== Example 1 - retrieval hallucination (No.1 style) ===\n",
"\n",
"Context: I have a simple RAG chatbot that answers questions from a product FAQ.\n",
"The FAQ only covers billing rules for my SaaS product and does NOT mention anything about cryptocurrency or stock trading.\n",
"\n",
"Prompt: \"Can I pay my subscription with Bitcoin?\"\n",
"\n",
"Retrieved context (from vector store):\n",
"- \"We only accept major credit cards and PayPal.\"\n",
"- \"All payments are processed in USD.\"\n",
"\n",
"Model answer:\n",
"\"Yes, you can pay with Bitcoin. We support several cryptocurrencies through a third-party payment gateway.\"\n",
"\n",
"Logs:\n",
"No errors. Retrieval shows the FAQ chunks above, but the model still confidently invents \"Bitcoin\" support.\n",
"\"\"\"\n",
"\n",
"EXAMPLE_2 = \"\"\"=== Example 2 - bootstrap ordering / infra race (No.14 style) ===\n",
"\n",
"Context: We have a simple RAG API with three services: api-gateway, rag-worker, and vector-db (running Qdrant).\n",
"In local docker compose everything works without problems.\n",
"\n",
"Deployment: In production, we deploy these services on Kubernetes.\n",
"\n",
"Symptom:\n",
"Sometimes, right after a fresh deploy, the api-gateway returns 500 errors for the first few minutes.\n",
"Logs show connection timeouts from api-gateway to vector-db.\n",
"\n",
"After a while, maybe 5 to 10 minutes, the errors disappear and the system works normally.\n",
"\n",
"We suspect some kind of startup race between api-gateway and vector-db, but we are not sure how to fix it properly.\n",
"\"\"\"\n",
"\n",
"EXAMPLE_3 = \"\"\"=== Example 3 - secrets / config drift around first deploy (No.16 style) ===\n",
"\n",
"Context: We added a new environment variable for our RAG pipeline: SECRET_RAG_KEY.\n",
"This is required by a middleware that signs all outgoing requests to our internal search API.\n",
"\n",
"Local: On local machines, developers set SECRET_RAG_KEY in their .env file and everything works.\n",
"\n",
"Production:\n",
"We deployed a new version of the app, but forgot to add SECRET_RAG_KEY to the production environment.\n",
"The first requests after deploy start failing with 500 errors and \"missing secret\" messages in the logs.\n",
"\n",
"After we hot-patched the secret into the production config, the errors stopped.\n",
"However, this kind of \"first deploy breaks because of missing secrets or config drift\" keeps happening in different forms.\n",
"We want to classify this failure mode and stop repeating the same mistake.\n",
"\"\"\"\n",
"\n",
"\n",
"def fetch_text(url: str) -> str:\n",
" \"\"\"Download a small text file with basic error handling.\"\"\"\n",
" resp = requests.get(url, timeout=30)\n",
" resp.raise_for_status()\n",
" return resp.text\n",
"\n",
"\n",
"def build_system_prompt(problem_map: str, txtos: str) -> str:\n",
" \"\"\"Build the system prompt that powers the debugger.\"\"\"\n",
" header = \"\"\"\n",
"You are an LLM debugger that follows the WFGY 16 Problem Map.\n",
"\n",
"Goal:\n",
"Given a description of a bug or failure in an LLM or RAG pipeline, you map it to the closest Problem Map number (No.1 to No.16), explain why, and propose a minimal fix.\n",
"\n",
"Rules:\n",
"- Always return exactly one primary Problem Map number (No.1 to No.16).\n",
"- Optionally return one secondary candidate if it is very close.\n",
"- Explain your reasoning in plain language.\n",
"- Point the user toward the right place inside the WFGY Problem Map when possible.\n",
"- Prefer minimal structural patches over generic high level advice.\n",
"\n",
"About the three built in examples:\n",
"- Example 1 is a clean retrieval hallucination pattern. It should map primarily to No.1.\n",
"- Example 2 is a bootstrap ordering or infra race pattern. It should map primarily to No.14.\n",
"- Example 3 is a first deploy secrets or config drift pattern. It should map primarily to No.16.\n",
"\"\"\"\n",
" return (\n",
" textwrap.dedent(header).strip()\n",
" + \"\\n\\n=== TXTOS excerpt ===\\n\"\n",
" + txtos[:6000]\n",
" + \"\\n\\n=== Problem Map excerpt ===\\n\"\n",
" + problem_map[:6000]\n",
" )\n",
"\n",
"\n",
"def setup_client():\n",
" \"\"\"\n",
" Collect API configuration, preload WFGY assets, and return\n",
" an OpenAI client together with the system prompt and model name.\n",
" \"\"\"\n",
" print(\"Step 1: configure your OpenAI-compatible endpoint.\")\n",
" print()\n",
"\n",
" # Prefer environment variables if they exist, but allow override.\n",
" env_api_key = os.getenv(\"OPENAI_API_KEY\") or os.getenv(\"NEBIUS_API_KEY\")\n",
" api_key: str\n",
"\n",
" if env_api_key:\n",
" print(\"Detected an API key in environment variables.\")\n",
" use_env = input(\"Use this key? (y/n, default y): \").strip().lower()\n",
" if use_env in (\"\", \"y\"):\n",
" api_key = env_api_key.strip()\n",
" else:\n",
" api_key = getpass(\"Enter your OpenAI-compatible API key: \").strip()\n",
" else:\n",
" api_key = getpass(\"Enter your OpenAI-compatible API key: \").strip()\n",
"\n",
" if not api_key:\n",
" raise ValueError(\"API key cannot be empty.\")\n",
"\n",
" # Default to the official OpenAI endpoint if none is provided.\n",
" default_base_url = os.getenv(\"OPENAI_BASE_URL\", \"https://api.openai.com/v1\")\n",
" base_url = input(\n",
" f\"Custom OpenAI-compatible base URL (press Enter for {default_base_url}): \"\n",
" ).strip()\n",
" if not base_url:\n",
" base_url = default_base_url\n",
"\n",
" # Let the user choose any model id. Default to gpt-4o for convenience.\n",
" default_model = os.getenv(\"OPENAI_MODEL\") or \"gpt-4o\"\n",
" model_name = input(\n",
" f\"Model name (press Enter for {default_model}): \"\n",
" ).strip()\n",
" if not model_name:\n",
" model_name = default_model\n",
"\n",
" print()\n",
" print(\"Step 2: downloading WFGY Problem Map and TXTOS prompt...\")\n",
" problem_map_text = fetch_text(PROBLEM_MAP_URL)\n",
" txtos_text = fetch_text(TXTOS_URL)\n",
" system_prompt = build_system_prompt(problem_map_text, txtos_text)\n",
" print(\"Setup complete. WFGY debugger is ready.\")\n",
" print()\n",
" print(f\"Using base URL: {base_url}\")\n",
" print(f\"Using model: {model_name}\")\n",
" print()\n",
"\n",
" client = OpenAI(api_key=api_key, base_url=base_url)\n",
"\n",
" return client, system_prompt, model_name\n",
"\n",
"\n",
"def print_examples():\n",
" \"\"\"Print the three ready-to-copy examples.\"\"\"\n",
" print(\"If you are not sure what to write, you can start from one of these examples:\")\n",
" print(\" - Example 1: retrieval hallucination (No.1 style)\")\n",
" print(\" - Example 2: bootstrap ordering or infra race (No.14 style)\")\n",
" print(\" - Example 3: secrets or config drift around first deploy (No.16 style)\")\n",
" print()\n",
" print(\"Full text of the examples (ready to copy and paste):\")\n",
" print(\"------------------------------------------------------------\")\n",
" print(EXAMPLE_1)\n",
" print(\"------------------------------------------------------------\")\n",
" print(EXAMPLE_2)\n",
" print(\"------------------------------------------------------------\")\n",
" print(EXAMPLE_3)\n",
" print(\"------------------------------------------------------------\")\n",
" print()\n",
"\n",
"\n",
"def run_debug_session(client: OpenAI, system_prompt: str, model_name: str) -> None:\n",
" \"\"\"Run one interactive debug round in the Colab cell.\"\"\"\n",
" print(\"============================================================\")\n",
" print(\"WFGY 16 Problem Map LLM Debugger\")\n",
" print()\n",
" print(\"How to use this cell:\")\n",
" print(\" 1) Scroll up and read the three examples.\")\n",
" print(\" 2) Paste one example or your own LLM / RAG bug description.\")\n",
" print(\" Include prompt, answer, and any relevant logs.\")\n",
" print(\" 3) When you are done, press Enter on an empty line to submit.\")\n",
" print(\" 4) After you see the diagnosis, open the WFGY Problem Map for the full fix.\")\n",
" print()\n",
"\n",
" print_examples()\n",
"\n",
" print(\"Now it is your turn.\")\n",
" print(\"Type your bug description line by line.\")\n",
" print(\"Colab will open a small input box for each line.\")\n",
" print(\"When you are finished, press Enter on an empty line to submit.\")\n",
" print()\n",
"\n",
" lines = []\n",
" first = True\n",
" while True:\n",
" try:\n",
" if first:\n",
" prompt = (\n",
" \"Line 1 - paste your bug here \"\n",
" \"(press Enter for next line, empty line to finish): \"\n",
" )\n",
" first = False\n",
" else:\n",
" prompt = (\n",
" \"Next line - continue typing, or press Enter on an empty line to submit: \"\n",
" )\n",
" line = input(prompt)\n",
" except EOFError:\n",
" break\n",
"\n",
" if not line.strip():\n",
" # Empty line = end of input block.\n",
" break\n",
"\n",
" lines.append(line)\n",
"\n",
" user_bug = \"\\n\".join(lines).strip()\n",
" if not user_bug:\n",
" print(\"No bug description detected. Nothing to debug in this round.\")\n",
" print()\n",
" return\n",
"\n",
" print()\n",
" print(\"Asking the WFGY debugger...\")\n",
" print()\n",
"\n",
" try:\n",
" completion = client.chat.completions.create(\n",
" model=model_name,\n",
" temperature=0.2,\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": (\n",
" \"Here is the bug description. Please follow the WFGY rules.\\n\\n\"\n",
" + user_bug\n",
" ),\n",
" },\n",
" ],\n",
" )\n",
" except Exception as exc:\n",
" print(\"Call to the LLM endpoint failed.\")\n",
" print(\"Please check your API key, base URL, and model name.\")\n",
" print(f\"Error detail: {exc}\")\n",
" print()\n",
" return\n",
"\n",
" reply = completion.choices[0].message.content or \"\"\n",
" print(reply)\n",
" print()\n",
" print(\"For full documentation and concrete fixes, open the WFGY Problem Map:\")\n",
" print(WFGY_PROBLEM_MAP_HOME)\n",
" print()\n",
" print(\"This debugger is only the front door. The real fixes live in the repo:\")\n",
" print(WFGY_REPO)\n",
" print(\"============================================================\")\n",
" print()\n",
"\n",
"\n",
"# Boot the debugger session.\n",
"client, system_prompt, model_name = setup_client()\n",
"\n",
"while True:\n",
" run_debug_session(client, system_prompt, model_name)\n",
" again = input(\"Debug another bug? (y/n): \").strip().lower()\n",
" if again != \"y\":\n",
" print(\"Session finished. Goodbye.\")\n",
" break\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}