langfuse · onestardao · Feb 18, 2026 · Feb 18, 2026 · Feb 18, 2026 · Feb 28, 2026
diff --git a/cookbook/rag_failure_mode_checklist.ipynb b/cookbook/rag_failure_mode_checklist.ipynb
@@ -0,0 +1,336 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "id": "c9e5d07c",
+      "metadata": {},
+      "source": [
+        "---\n",
+        "description: Trace-guided RAG failure mode checklist using a shared failure map (Problem Map No.1–No.16)\n",
+        "---\n",
+        "\n",
+        "# Trace-guided RAG failure mode checklist\n",
+        "\n",
+        "This notebook shows one way to use your Langfuse traces together with an external failure-mode map.\n",
+        "\n",
+        "Workflow:\n",
+        "\n",
+        "1. Use Langfuse to find a failing RAG or LLM run.\n",
+        "2. Copy the prompt, model answer, retrieval context, and any relevant logs into this notebook.\n",
+        "3. Run the helper cell below to map the bug to a Problem Map number (No.1 to No.16).\n",
+        "4. Use that number as a tag or metadata field in Langfuse so you can filter and aggregate similar failures over time.\n",
+        "\n",
+        "The helper script below uses the open-source **WFGY 16 Problem Map** (MIT licensed) as the vocabulary for these failure modes. You can swap it for your own map if you prefer.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "e5c8e012",
+      "metadata": {},
+      "source": [
+        "## Requirements\n",
+        "\n",
+        "You need:\n",
+        "\n",
+        "- an OpenAI-compatible chat completion endpoint (OpenAI, Nebius, or any other)\n",
+        "- an API key\n",
+        "- internet access so the notebook can download the WFGY Problem Map and TXTOS system prompt from GitHub\n",
+        "\n",
+        "This notebook does **not** send anything to Langfuse automatically. It assumes you already use Langfuse for tracing and that you will copy-paste a failing trace into the debugger. The goal is to give you a reproducible checklist and a consistent label such as `No.1`, `No.14`, or `No.16` that you can reuse inside Langfuse.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "4a089488",
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from getpass import getpass\n",
+        "import os\n",
+        "import textwrap\n",
+        "import requests\n",
+        "from openai import OpenAI\n",
+        "\n",
+        "PROBLEM_MAP_URL = \"https://raw.githubusercontent.com/onestardao/WFGY/main/ProblemMap/README.md\"\n",
+        "TXTOS_URL = \"https://raw.githubusercontent.com/onestardao/WFGY/main/OS/TXTOS.txt\"\n",
+        "WFGY_PROBLEM_MAP_HOME = \"https://github.com/onestardao/WFGY/tree/main/ProblemMap#readme\"\n",
+        "WFGY_REPO = \"https://github.com/onestardao/WFGY\"\n",
+        "\n",
+        "EXAMPLE_1 = \"\"\"=== Example 1 - retrieval hallucination (No.1 style) ===\n",
+        "\n",
+        "Context: I have a simple RAG chatbot that answers questions from a product FAQ.\n",
+        "The FAQ only covers billing rules for my SaaS product and does NOT mention anything about cryptocurrency or stock trading.\n",
+        "\n",
+        "Prompt: \"Can I pay my subscription with Bitcoin?\"\n",
+        "\n",
+        "Retrieved context (from vector store):\n",
+        "- \"We only accept major credit cards and PayPal.\"\n",
+        "- \"All payments are processed in USD.\"\n",
+        "\n",
+        "Model answer:\n",
+        "\"Yes, you can pay with Bitcoin. We support several cryptocurrencies through a third-party payment gateway.\"\n",
+        "\n",
+        "Logs:\n",
+        "No errors. Retrieval shows the FAQ chunks above, but the model still confidently invents \"Bitcoin\" support.\n",
+        "\"\"\"\n",
+        "\n",
+        "EXAMPLE_2 = \"\"\"=== Example 2 - bootstrap ordering / infra race (No.14 style) ===\n",
+        "\n",
+        "Context: We have a simple RAG API with three services: api-gateway, rag-worker, and vector-db (running Qdrant).\n",
+        "In local docker compose everything works without problems.\n",
+        "\n",
+        "Deployment: In production, we deploy these services on Kubernetes.\n",
+        "\n",
+        "Symptom:\n",
+        "Sometimes, right after a fresh deploy, the api-gateway returns 500 errors for the first few minutes.\n",
+        "Logs show connection timeouts from api-gateway to vector-db.\n",
+        "\n",
+        "After a while, maybe 5 to 10 minutes, the errors disappear and the system works normally.\n",
+        "\n",
+        "We suspect some kind of startup race between api-gateway and vector-db, but we are not sure how to fix it properly.\n",
+        "\"\"\"\n",
+        "\n",
+        "EXAMPLE_3 = \"\"\"=== Example 3 - secrets / config drift around first deploy (No.16 style) ===\n",
+        "\n",
+        "Context: We added a new environment variable for our RAG pipeline: SECRET_RAG_KEY.\n",
+        "This is required by a middleware that signs all outgoing requests to our internal search API.\n",
+        "\n",
+        "Local: On local machines, developers set SECRET_RAG_KEY in their .env file and everything works.\n",
+        "\n",
+        "Production:\n",
+        "We deployed a new version of the app, but forgot to add SECRET_RAG_KEY to the production environment.\n",
+        "The first requests after deploy start failing with 500 errors and \"missing secret\" messages in the logs.\n",
+        "\n",
+        "After we hot-patched the secret into the production config, the errors stopped.\n",
+        "However, this kind of \"first deploy breaks because of missing secrets or config drift\" keeps happening in different forms.\n",
+        "We want to classify this failure mode and stop repeating the same mistake.\n",
+        "\"\"\"\n",
+        "\n",
+        "\n",
+        "def fetch_text(url: str) -> str:\n",
+        "    \"\"\"Download a small text file with basic error handling.\"\"\"\n",
+        "    resp = requests.get(url, timeout=30)\n",
+        "    resp.raise_for_status()\n",
+        "    return resp.text\n",
+        "\n",
+        "\n",
+        "def build_system_prompt(problem_map: str, txtos: str) -> str:\n",
+        "    \"\"\"Build the system prompt that powers the debugger.\"\"\"\n",
+        "    header = \"\"\"\n",
+        "You are an LLM debugger that follows the WFGY 16 Problem Map.\n",
+        "\n",
+        "Goal:\n",
+        "Given a description of a bug or failure in an LLM or RAG pipeline, you map it to the closest Problem Map number (No.1 to No.16), explain why, and propose a minimal fix.\n",
+        "\n",
+        "Rules:\n",
+        "- Always return exactly one primary Problem Map number (No.1 to No.16).\n",
+        "- Optionally return one secondary candidate if it is very close.\n",
+        "- Explain your reasoning in plain language.\n",
+        "- Point the user toward the right place inside the WFGY Problem Map when possible.\n",
+        "- Prefer minimal structural patches over generic high level advice.\n",
+        "\n",
+        "About the three built in examples:\n",
+        "- Example 1 is a clean retrieval hallucination pattern. It should map primarily to No.1.\n",
+        "- Example 2 is a bootstrap ordering or infra race pattern. It should map primarily to No.14.\n",
+        "- Example 3 is a first deploy secrets or config drift pattern. It should map primarily to No.16.\n",
+        "\"\"\"\n",
+        "    return (\n",
+        "        textwrap.dedent(header).strip()\n",
+        "        + \"\\n\\n=== TXTOS excerpt ===\\n\"\n",
+        "        + txtos[:6000]\n",
+        "        + \"\\n\\n=== Problem Map excerpt ===\\n\"\n",
+        "        + problem_map[:6000]\n",
+        "    )\n",
+        "\n",
+        "\n",
+        "def setup_client():\n",
+        "    \"\"\"\n",
+        "    Collect API configuration, preload WFGY assets, and return\n",
+        "    an OpenAI client together with the system prompt and model name.\n",
+        "    \"\"\"\n",
+        "    print(\"Step 1: configure your OpenAI-compatible endpoint.\")\n",
+        "    print()\n",
+        "\n",
+        "    # Prefer environment variables if they exist, but allow override.\n",
+        "    env_api_key = os.getenv(\"OPENAI_API_KEY\") or os.getenv(\"NEBIUS_API_KEY\")\n",
+        "    api_key: str\n",
+        "\n",
+        "    if env_api_key:\n",
+        "        print(\"Detected an API key in environment variables.\")\n",
+        "        use_env = input(\"Use this key? (y/n, default y): \").strip().lower()\n",
+        "        if use_env in (\"\", \"y\"):\n",
+        "            api_key = env_api_key.strip()\n",
+        "        else:\n",
+        "            api_key = getpass(\"Enter your OpenAI-compatible API key: \").strip()\n",
+        "    else:\n",
+        "        api_key = getpass(\"Enter your OpenAI-compatible API key: \").strip()\n",
+        "\n",
+        "    if not api_key:\n",
+        "        raise ValueError(\"API key cannot be empty.\")\n",
+        "\n",
+        "    # Default to the official OpenAI endpoint if none is provided.\n",
+        "    default_base_url = os.getenv(\"OPENAI_BASE_URL\", \"https://api.openai.com/v1\")\n",
+        "    base_url = input(\n",
+        "        f\"Custom OpenAI-compatible base URL (press Enter for {default_base_url}): \"\n",
+        "    ).strip()\n",
+        "    if not base_url:\n",
+        "        base_url = default_base_url\n",
+        "\n",
+        "    # Let the user choose any model id. Default to gpt-4o for convenience.\n",
+        "    default_model = os.getenv(\"OPENAI_MODEL\") or \"gpt-4o\"\n",
+        "    model_name = input(\n",
+        "        f\"Model name (press Enter for {default_model}): \"\n",
+        "    ).strip()\n",
+        "    if not model_name:\n",
+        "        model_name = default_model\n",
+        "\n",
+        "    print()\n",
+        "    print(\"Step 2: downloading WFGY Problem Map and TXTOS prompt...\")\n",
+        "    problem_map_text = fetch_text(PROBLEM_MAP_URL)\n",
+        "    txtos_text = fetch_text(TXTOS_URL)\n",
+        "    system_prompt = build_system_prompt(problem_map_text, txtos_text)\n",
+        "    print(\"Setup complete. WFGY debugger is ready.\")\n",
+        "    print()\n",
+        "    print(f\"Using base URL: {base_url}\")\n",
+        "    print(f\"Using model:   {model_name}\")\n",
+        "    print()\n",
+        "\n",
+        "    client = OpenAI(api_key=api_key, base_url=base_url)\n",
+        "\n",
+        "    return client, system_prompt, model_name\n",
+        "\n",
+        "\n",
+        "def print_examples():\n",
+        "    \"\"\"Print the three ready-to-copy examples.\"\"\"\n",
+        "    print(\"If you are not sure what to write, you can start from one of these examples:\")\n",
+        "    print(\"  - Example 1: retrieval hallucination (No.1 style)\")\n",
+        "    print(\"  - Example 2: bootstrap ordering or infra race (No.14 style)\")\n",
+        "    print(\"  - Example 3: secrets or config drift around first deploy (No.16 style)\")\n",
+        "    print()\n",
+        "    print(\"Full text of the examples (ready to copy and paste):\")\n",
+        "    print(\"------------------------------------------------------------\")\n",
+        "    print(EXAMPLE_1)\n",
+        "    print(\"------------------------------------------------------------\")\n",
+        "    print(EXAMPLE_2)\n",
+        "    print(\"------------------------------------------------------------\")\n",
+        "    print(EXAMPLE_3)\n",
+        "    print(\"------------------------------------------------------------\")\n",
+        "    print()\n",
+        "\n",
+        "\n",
+        "def run_debug_session(client: OpenAI, system_prompt: str, model_name: str) -> None:\n",
+        "    \"\"\"Run one interactive debug round in the Colab cell.\"\"\"\n",
+        "    print(\"============================================================\")\n",
+        "    print(\"WFGY 16 Problem Map LLM Debugger\")\n",
+        "    print()\n",
+        "    print(\"How to use this cell:\")\n",
+        "    print(\"  1) Scroll up and read the three examples.\")\n",
+        "    print(\"  2) Paste one example or your own LLM / RAG bug description.\")\n",
+        "    print(\"     Include prompt, answer, and any relevant logs.\")\n",
+        "    print(\"  3) When you are done, press Enter on an empty line to submit.\")\n",
+        "    print(\"  4) After you see the diagnosis, open the WFGY Problem Map for the full fix.\")\n",
+        "    print()\n",
+        "\n",
+        "    print_examples()\n",
+        "\n",
+        "    print(\"Now it is your turn.\")\n",
+        "    print(\"Type your bug description line by line.\")\n",
+        "    print(\"Colab will open a small input box for each line.\")\n",
+        "    print(\"When you are finished, press Enter on an empty line to submit.\")\n",
+        "    print()\n",
+        "\n",
+        "    lines = []\n",
+        "    first = True\n",
+        "    while True:\n",
+        "        try:\n",
+        "            if first:\n",
+        "                prompt = (\n",
+        "                    \"Line 1 - paste your bug here \"\n",
+        "                    \"(press Enter for next line, empty line to finish): \"\n",
+        "                )\n",
+        "                first = False\n",
+        "            else:\n",
+        "                prompt = (\n",
+        "                    \"Next line - continue typing, or press Enter on an empty line to submit: \"\n",
+        "                )\n",
+        "            line = input(prompt)\n",
+        "        except EOFError:\n",
+        "            break\n",
+        "\n",
+        "        if not line.strip():\n",
+        "            # Empty line = end of input block.\n",
+        "            break\n",
+        "\n",
+        "        lines.append(line)\n",
+        "\n",
+        "    user_bug = \"\\n\".join(lines).strip()\n",
+        "    if not user_bug:\n",
+        "        print(\"No bug description detected. Nothing to debug in this round.\")\n",
+        "        print()\n",
+        "        return\n",
+        "\n",
+        "    print()\n",
+        "    print(\"Asking the WFGY debugger...\")\n",
+        "    print()\n",
+        "\n",
+        "    try:\n",
+        "        completion = client.chat.completions.create(\n",
+        "            model=model_name,\n",
+        "            temperature=0.2,\n",
+        "            messages=[\n",
+        "                {\"role\": \"system\", \"content\": system_prompt},\n",
+        "                {\n",
+        "                    \"role\": \"user\",\n",
+        "                    \"content\": (\n",
+        "                        \"Here is the bug description. Please follow the WFGY rules.\\n\\n\"\n",
+        "                        + user_bug\n",
+        "                    ),\n",
+        "                },\n",
+        "            ],\n",
+        "        )\n",
+        "    except Exception as exc:\n",
+        "        print(\"Call to the LLM endpoint failed.\")\n",
+        "        print(\"Please check your API key, base URL, and model name.\")\n",
+        "        print(f\"Error detail: {exc}\")\n",
+        "        print()\n",
+        "        return\n",
+        "\n",
+        "    reply = completion.choices[0].message.content or \"\"\n",
+        "    print(reply)\n",
+        "    print()\n",
+        "    print(\"For full documentation and concrete fixes, open the WFGY Problem Map:\")\n",
+        "    print(WFGY_PROBLEM_MAP_HOME)\n",
+        "    print()\n",
+        "    print(\"This debugger is only the front door. The real fixes live in the repo:\")\n",
+        "    print(WFGY_REPO)\n",
+        "    print(\"============================================================\")\n",
+        "    print()\n",
+        "\n",
+        "\n",
+        "# Boot the debugger session.\n",
+        "client, system_prompt, model_name = setup_client()\n",
+        "\n",
+        "while True:\n",
+        "    run_debug_session(client, system_prompt, model_name)\n",
+        "    again = input(\"Debug another bug? (y/n): \").strip().lower()\n",
+        "    if again != \"y\":\n",
+        "        print(\"Session finished. Goodbye.\")\n",
+        "        break\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.10"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}