From 8d8bf866a17ec2babf119d045cb9fa2c734532f5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?PSBigBig=20=C3=97=20MiniPS?= Date: Wed, 18 Feb 2026 23:22:30 +0800 Subject: [PATCH 1/3] docs: add trace-guided RAG failure mode checklist --- cookbook/rag_failure_mode_checklist.ipynb | 336 ++++++++++++++++++++++ 1 file changed, 336 insertions(+) create mode 100644 cookbook/rag_failure_mode_checklist.ipynb diff --git a/cookbook/rag_failure_mode_checklist.ipynb b/cookbook/rag_failure_mode_checklist.ipynb new file mode 100644 index 000000000..22b0f9739 --- /dev/null +++ b/cookbook/rag_failure_mode_checklist.ipynb @@ -0,0 +1,336 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "c9e5d07c", + "metadata": {}, + "source": [ + "---\n", + "description: Trace-guided RAG failure mode checklist using a shared failure map (Problem Map No.1\u2013No.16).\n", + "---\n", + "\n", + "# Trace-guided RAG failure mode checklist\n", + "\n", + "This notebook shows one way to use your Langfuse traces together with an external failure-mode map.\n", + "\n", + "Workflow:\n", + "\n", + "1. Use Langfuse to find a failing RAG or LLM run.\n", + "2. Copy the prompt, model answer, retrieval context, and any relevant logs into this notebook.\n", + "3. Run the helper cell below to map the bug to a Problem Map number (No.1 to No.16).\n", + "4. Use that number as a tag or metadata field in Langfuse so you can filter and aggregate similar failures over time.\n", + "\n", + "The helper script below uses the open-source **WFGY 16 Problem Map** (MIT licensed) as the vocabulary for these failure modes. You can swap it for your own map if you prefer.\n" + ] + }, + { + "cell_type": "markdown", + "id": "e5c8e012", + "metadata": {}, + "source": [ + "## Requirements\n", + "\n", + "You need:\n", + "\n", + "- an OpenAI-compatible chat completion endpoint (OpenAI, Nebius, or any other)\n", + "- an API key\n", + "- internet access so the notebook can download the WFGY Problem Map and TXTOS system prompt from GitHub\n", + "\n", + "This notebook does **not** send anything to Langfuse automatically. It assumes you already use Langfuse for tracing and that you will copy-paste a failing trace into the debugger. The goal is to give you a reproducible checklist and a consistent label such as `No.1`, `No.14`, or `No.16` that you can reuse inside Langfuse.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4a089488", + "metadata": {}, + "outputs": [], + "source": [ + "from getpass import getpass\n", + "import os\n", + "import textwrap\n", + "import requests\n", + "from openai import OpenAI\n", + "\n", + "PROBLEM_MAP_URL = \"https://raw.githubusercontent.com/onestardao/WFGY/main/ProblemMap/README.md\"\n", + "TXTOS_URL = \"https://raw.githubusercontent.com/onestardao/WFGY/main/OS/TXTOS.txt\"\n", + "WFGY_PROBLEM_MAP_HOME = \"https://github.com/onestardao/WFGY/tree/main/ProblemMap#readme\"\n", + "WFGY_REPO = \"https://github.com/onestardao/WFGY\"\n", + "\n", + "EXAMPLE_1 = \"\"\"=== Example 1 - retrieval hallucination (No.1 style) ===\n", + "\n", + "Context: I have a simple RAG chatbot that answers questions from a product FAQ.\n", + "The FAQ only covers billing rules for my SaaS product and does NOT mention anything about cryptocurrency or stock trading.\n", + "\n", + "Prompt: \"Can I pay my subscription with Bitcoin?\"\n", + "\n", + "Retrieved context (from vector store):\n", + "- \"We only accept major credit cards and PayPal.\"\n", + "- \"All payments are processed in USD.\"\n", + "\n", + "Model answer:\n", + "\"Yes, you can pay with Bitcoin. We support several cryptocurrencies through a third-party payment gateway.\"\n", + "\n", + "Logs:\n", + "No errors. Retrieval shows the FAQ chunks above, but the model still confidently invents \"Bitcoin\" support.\n", + "\"\"\"\n", + "\n", + "EXAMPLE_2 = \"\"\"=== Example 2 - bootstrap ordering / infra race (No.14 style) ===\n", + "\n", + "Context: We have a simple RAG API with three services: api-gateway, rag-worker, and vector-db (running Qdrant).\n", + "In local docker compose everything works without problems.\n", + "\n", + "Deployment: In production, we deploy these services on Kubernetes.\n", + "\n", + "Symptom:\n", + "Sometimes, right after a fresh deploy, the api-gateway returns 500 errors for the first few minutes.\n", + "Logs show connection timeouts from api-gateway to vector-db.\n", + "\n", + "After a while, maybe 5 to 10 minutes, the errors disappear and the system works normally.\n", + "\n", + "We suspect some kind of startup race between api-gateway and vector-db, but we are not sure how to fix it properly.\n", + "\"\"\"\n", + "\n", + "EXAMPLE_3 = \"\"\"=== Example 3 - secrets / config drift around first deploy (No.16 style) ===\n", + "\n", + "Context: We added a new environment variable for our RAG pipeline: SECRET_RAG_KEY.\n", + "This is required by a middleware that signs all outgoing requests to our internal search API.\n", + "\n", + "Local: On local machines, developers set SECRET_RAG_KEY in their .env file and everything works.\n", + "\n", + "Production:\n", + "We deployed a new version of the app, but forgot to add SECRET_RAG_KEY to the production environment.\n", + "The first requests after deploy start failing with 500 errors and \"missing secret\" messages in the logs.\n", + "\n", + "After we hot-patched the secret into the production config, the errors stopped.\n", + "However, this kind of \"first deploy breaks because of missing secrets or config drift\" keeps happening in different forms.\n", + "We want to classify this failure mode and stop repeating the same mistake.\n", + "\"\"\"\n", + "\n", + "\n", + "def fetch_text(url: str) -> str:\n", + " \"\"\"Download a small text file with basic error handling.\"\"\"\n", + " resp = requests.get(url, timeout=30)\n", + " resp.raise_for_status()\n", + " return resp.text\n", + "\n", + "\n", + "def build_system_prompt(problem_map: str, txtos: str) -> str:\n", + " \"\"\"Build the system prompt that powers the debugger.\"\"\"\n", + " header = \"\"\"\n", + "You are an LLM debugger that follows the WFGY 16 Problem Map.\n", + "\n", + "Goal:\n", + "Given a description of a bug or failure in an LLM or RAG pipeline, you map it to the closest Problem Map number (No.1 to No.16), explain why, and propose a minimal fix.\n", + "\n", + "Rules:\n", + "- Always return exactly one primary Problem Map number (No.1 to No.16).\n", + "- Optionally return one secondary candidate if it is very close.\n", + "- Explain your reasoning in plain language.\n", + "- Point the user toward the right place inside the WFGY Problem Map when possible.\n", + "- Prefer minimal structural patches over generic high level advice.\n", + "\n", + "About the three built in examples:\n", + "- Example 1 is a clean retrieval hallucination pattern. It should map primarily to No.1.\n", + "- Example 2 is a bootstrap ordering or infra race pattern. It should map primarily to No.14.\n", + "- Example 3 is a first deploy secrets or config drift pattern. It should map primarily to No.16.\n", + "\"\"\"\n", + " return (\n", + " textwrap.dedent(header).strip()\n", + " + \"\\n\\n=== TXTOS excerpt ===\\n\"\n", + " + txtos[:6000]\n", + " + \"\\n\\n=== Problem Map excerpt ===\\n\"\n", + " + problem_map[:6000]\n", + " )\n", + "\n", + "\n", + "def setup_client():\n", + " \"\"\"\n", + " Collect API configuration, preload WFGY assets, and return\n", + " an OpenAI client together with the system prompt and model name.\n", + " \"\"\"\n", + " print(\"Step 1: configure your OpenAI-compatible endpoint.\")\n", + " print()\n", + "\n", + " # Prefer environment variables if they exist, but allow override.\n", + " env_api_key = os.getenv(\"OPENAI_API_KEY\") or os.getenv(\"NEBIUS_API_KEY\")\n", + " api_key: str\n", + "\n", + " if env_api_key:\n", + " print(\"Detected an API key in environment variables.\")\n", + " use_env = input(\"Use this key? (y/n, default y): \").strip().lower()\n", + " if use_env in (\"\", \"y\"):\n", + " api_key = env_api_key.strip()\n", + " else:\n", + " api_key = getpass(\"Enter your OpenAI-compatible API key: \").strip()\n", + " else:\n", + " api_key = getpass(\"Enter your OpenAI-compatible API key: \").strip()\n", + "\n", + " if not api_key:\n", + " raise ValueError(\"API key cannot be empty.\")\n", + "\n", + " # Default to Nebius Token Factory endpoint if none is provided.\n", + " default_base_url = os.getenv(\"OPENAI_BASE_URL\", \"https://api.tokenfactory.nebius.com/v1/\")\n", + " base_url = input(\n", + " f\"Custom OpenAI-compatible base URL (press Enter for {default_base_url}): \"\n", + " ).strip()\n", + " if not base_url:\n", + " base_url = default_base_url\n", + "\n", + " # Let the user choose any model id. Provide a Nebius friendly default.\n", + " default_model = os.getenv(\"OPENAI_MODEL\") or \"meta-llama/Meta-Llama-3.1-70B-Instruct\"\n", + " model_name = input(\n", + " f\"Model name (press Enter for {default_model}): \"\n", + " ).strip()\n", + " if not model_name:\n", + " model_name = default_model\n", + "\n", + " print()\n", + " print(\"Step 2: downloading WFGY Problem Map and TXTOS prompt...\")\n", + " problem_map_text = fetch_text(PROBLEM_MAP_URL)\n", + " txtos_text = fetch_text(TXTOS_URL)\n", + " system_prompt = build_system_prompt(problem_map_text, txtos_text)\n", + " print(\"Setup complete. WFGY debugger is ready.\")\n", + " print()\n", + " print(f\"Using base URL: {base_url}\")\n", + " print(f\"Using model: {model_name}\")\n", + " print()\n", + "\n", + " client = OpenAI(api_key=api_key, base_url=base_url)\n", + "\n", + " return client, system_prompt, model_name\n", + "\n", + "\n", + "def print_examples():\n", + " \"\"\"Print the three ready-to-copy examples.\"\"\"\n", + " print(\"If you are not sure what to write, you can start from one of these examples:\")\n", + " print(\" - Example 1: retrieval hallucination (No.1 style)\")\n", + " print(\" - Example 2: bootstrap ordering or infra race (No.14 style)\")\n", + " print(\" - Example 3: secrets or config drift around first deploy (No.16 style)\")\n", + " print()\n", + " print(\"Full text of the examples (ready to copy and paste):\")\n", + " print(\"------------------------------------------------------------\")\n", + " print(EXAMPLE_1)\n", + " print(\"------------------------------------------------------------\")\n", + " print(EXAMPLE_2)\n", + " print(\"------------------------------------------------------------\")\n", + " print(EXAMPLE_3)\n", + " print(\"------------------------------------------------------------\")\n", + " print()\n", + "\n", + "\n", + "def run_debug_session(client: OpenAI, system_prompt: str, model_name: str) -> None:\n", + " \"\"\"Run one interactive debug round in the Colab cell.\"\"\"\n", + " print(\"============================================================\")\n", + " print(\"WFGY 16 Problem Map LLM Debugger\")\n", + " print()\n", + " print(\"How to use this cell:\")\n", + " print(\" 1) Scroll up and read the three examples.\")\n", + " print(\" 2) Paste one example or your own LLM / RAG bug description.\")\n", + " print(\" Include prompt, answer, and any relevant logs.\")\n", + " print(\" 3) When you are done, press Enter on an empty line to submit.\")\n", + " print(\" 4) After you see the diagnosis, open the WFGY Problem Map for the full fix.\")\n", + " print()\n", + "\n", + " print_examples()\n", + "\n", + " print(\"Now it is your turn.\")\n", + " print(\"Type your bug description line by line.\")\n", + " print(\"Colab will open a small input box for each line.\")\n", + " print(\"When you are finished, press Enter on an empty line to submit.\")\n", + " print()\n", + "\n", + " lines = []\n", + " first = True\n", + " while True:\n", + " try:\n", + " if first:\n", + " prompt = (\n", + " \"Line 1 - paste your bug here \"\n", + " \"(press Enter for next line, empty line to finish): \"\n", + " )\n", + " first = False\n", + " else:\n", + " prompt = (\n", + " \"Next line - continue typing, or press Enter on an empty line to submit: \"\n", + " )\n", + " line = input(prompt)\n", + " except EOFError:\n", + " break\n", + "\n", + " if not line.strip():\n", + " # Empty line = end of input block.\n", + " break\n", + "\n", + " lines.append(line)\n", + "\n", + " user_bug = \"\\n\".join(lines).strip()\n", + " if not user_bug:\n", + " print(\"No bug description detected. Nothing to debug in this round.\")\n", + " print()\n", + " return\n", + "\n", + " print()\n", + " print(\"Asking the WFGY debugger...\")\n", + " print()\n", + "\n", + " try:\n", + " completion = client.chat.completions.create(\n", + " model=model_name,\n", + " temperature=0.2,\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": system_prompt},\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": (\n", + " \"Here is the bug description. Please follow the WFGY rules.\\n\\n\"\n", + " + user_bug\n", + " ),\n", + " },\n", + " ],\n", + " )\n", + " except Exception as exc:\n", + " print(\"Call to the LLM endpoint failed.\")\n", + " print(\"Please check your API key, base URL, and model name.\")\n", + " print(f\"Error detail: {exc}\")\n", + " print()\n", + " return\n", + "\n", + " reply = completion.choices[0].message.content or \"\"\n", + " print(reply)\n", + " print()\n", + " print(\"For full documentation and concrete fixes, open the WFGY Problem Map:\")\n", + " print(WFGY_PROBLEM_MAP_HOME)\n", + " print()\n", + " print(\"This debugger is only the front door. The real fixes live in the repo:\")\n", + " print(WFGY_REPO)\n", + " print(\"============================================================\")\n", + " print()\n", + "\n", + "\n", + "# Boot the debugger session.\n", + "client, system_prompt, model_name = setup_client()\n", + "\n", + "while True:\n", + " run_debug_session(client, system_prompt, model_name)\n", + " again = input(\"Debug another bug? (y/n): \").strip().lower()\n", + " if again != \"y\":\n", + " print(\"Session finished. Goodbye.\")\n", + " break\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From 459de67d82c37c645bf1ba3c29536697d6af550e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?PSBigBig=20=C3=97=20MiniPS?= Date: Wed, 18 Feb 2026 23:45:43 +0800 Subject: [PATCH 2/3] Update rag_failure_mode_checklist.ipynb --- cookbook/rag_failure_mode_checklist.ipynb | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/cookbook/rag_failure_mode_checklist.ipynb b/cookbook/rag_failure_mode_checklist.ipynb index 22b0f9739..8c62fdb1d 100644 --- a/cookbook/rag_failure_mode_checklist.ipynb +++ b/cookbook/rag_failure_mode_checklist.ipynb @@ -6,7 +6,7 @@ "metadata": {}, "source": [ "---\n", - "description: Trace-guided RAG failure mode checklist using a shared failure map (Problem Map No.1\u2013No.16).\n", + "description: Trace-guided RAG failure mode checklist using a shared failure map (Problem Map No.1–No.16).\n", "---\n", "\n", "# Trace-guided RAG failure mode checklist\n", @@ -169,16 +169,16 @@ " if not api_key:\n", " raise ValueError(\"API key cannot be empty.\")\n", "\n", - " # Default to Nebius Token Factory endpoint if none is provided.\n", - " default_base_url = os.getenv(\"OPENAI_BASE_URL\", \"https://api.tokenfactory.nebius.com/v1/\")\n", + " # Default to the official OpenAI endpoint if none is provided.\n", + " default_base_url = os.getenv(\"OPENAI_BASE_URL\", \"https://api.openai.com/v1\")\n", " base_url = input(\n", " f\"Custom OpenAI-compatible base URL (press Enter for {default_base_url}): \"\n", " ).strip()\n", " if not base_url:\n", " base_url = default_base_url\n", "\n", - " # Let the user choose any model id. Provide a Nebius friendly default.\n", - " default_model = os.getenv(\"OPENAI_MODEL\") or \"meta-llama/Meta-Llama-3.1-70B-Instruct\"\n", + " # Let the user choose any model id. Default to gpt-4o for convenience.\n", + " default_model = os.getenv(\"OPENAI_MODEL\") or \"gpt-4o\"\n", " model_name = input(\n", " f\"Model name (press Enter for {default_model}): \"\n", " ).strip()\n", From 9ccba9ec2d47422e504fd1b997e060c41754472d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?PSBigBig=20=C3=97=20MiniPS?= Date: Wed, 18 Feb 2026 23:52:45 +0800 Subject: [PATCH 3/3] docs(cookbook): add trace-guided RAG failure mode checklist notebook MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a new cookbook notebook that demonstrates how to: - classify RAG / LLM failures using a structured failure map (No.1–No.16) - map symptoms from traces to concrete inspection points - apply minimal structural fixes instead of generic advice The notebook uses an OpenAI-compatible endpoint and is designed as a practical debugging checklist for Langfuse users. --- cookbook/rag_failure_mode_checklist.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cookbook/rag_failure_mode_checklist.ipynb b/cookbook/rag_failure_mode_checklist.ipynb index 8c62fdb1d..d0885a7b8 100644 --- a/cookbook/rag_failure_mode_checklist.ipynb +++ b/cookbook/rag_failure_mode_checklist.ipynb @@ -6,7 +6,7 @@ "metadata": {}, "source": [ "---\n", - "description: Trace-guided RAG failure mode checklist using a shared failure map (Problem Map No.1–No.16).\n", + "description: Trace-guided RAG failure mode checklist using a shared failure map (Problem Map No.1–No.16)\n", "---\n", "\n", "# Trace-guided RAG failure mode checklist\n",