Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Lexiang Xiong^1* Qi Li^1* Jingwen Ye² Xinchao Wang^1†

¹xML-Lab, National University of Singapore ²Monash University

^*Equal contribution; ^†Corresponding author

Overview

Vision-Language Models (VLMs) frequently hallucinate. Current evaluation methods treat these errors as static, monolithic failures. In this work, we propose a paradigm shift: viewing hallucination as a dynamic pathology within a model's computational cognition.

We introduce Cognitive Anomaly Detection (CAD), a framework that projects a VLM's generative process onto an interpretable, low-dimensional Cognitive State Space using three novel information-theoretic probes:

Perceptual Instability ($H_{Evi}$) - Measures uncertainty in evidence tokens
Logical-Causal Failure ($S_{Conf}$) - Captures inferential conflict between vision and text
Decisional Ambiguity ($H_{Ans}$) - Measures uncertainty in final answer

By leveraging geometric-information duality, CAD diagnoses hallucinations as geometric anomalies with high information-theoretic surprisal, requiring only a single generation pass and weak supervision (no token-level hallucination labels needed).

Key Features

Mechanistic Diagnosis: Uncovers distinct failure modes (e.g., Computational Cognitive Dissonance, Transparent Struggles, Entangled States)
High Efficiency: Single-pass generation + lightweight non-autoregressive replay
Weak Supervision: Calibrates on a small set of ground-truth answers (resilient to up to 30% calibration contamination)
State-of-the-Art: Achieves superior AUC across POPE, MME, and MS-COCO on models like Idefics2, Llava-v1.6, Qwen2-VL, and DeepSeek-VL2

Installation

# Clone the repository
git clone https://github.com/Lexiang-Xiong/CAD
cd Anatomy-of-a-Lie

# Create conda environment
conda create -n cad-vlm python=3.10 -y
conda activate cad-vlm

# Install PyTorch (adjust CUDA version as needed)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install requirements
pip install -r requirements.txt

# For Qwen2-VL support
pip install qwen-vl-utils

# For DeepSeek-VL2 support
pip install deepseek_vl2

Environment Variables

Set your HuggingFace token for accessing gated models:

export HF_TOKEN="your_huggingface_token"

Quick Start

1. Extract Cognitive Trajectories

Run the extraction scripts to generate responses, perform text-only replay, and calculate the core metrics ($H_{Evi}$, $S_{Conf}$, $H_{Ans}$).

# Idefics2
python scripts/extraction/extract_idefics2.py \
    --dataset lmms-lab/POPE \
    --output_dir results/idefics2

# LLaVA
python scripts/extraction/extract_llava.py \
    --dataset lmms-lab/POPE \
    --output_dir results/llava

# Qwen2-VL
python scripts/extraction/extract_qwen2.py \
    --dataset lmms-lab/POPE \
    --output_dir results/qwen2

# DeepSeek-VL2
python scripts/extraction/extract_deepseek.py \
    --dataset lmms-lab/POPE \
    --output_dir results/deepseek

2. Fit CAD Detector and Evaluate

Calibrate the GMM on the nominal state manifold and predict hallucinations on the test set.

# Basic evaluation
python scripts/evaluation/run_cad_eval.py \
    --input_file results/idefics2/hallucination_metrics_full.csv \
    --model_name Idefics2 \
    --n_components 7

# Automatic K selection using BIC
python scripts/evaluation/run_cad_eval.py \
    --input_file results/idefics2/hallucination_metrics_full.csv \
    --model_name Idefics2 \
    --auto_k

3. Run Ablation Study

Evaluate the contribution of each metric component.

python scripts/evaluation/run_ablation.py \
    --input_file results/idefics2/hallucination_metrics_full.csv \
    --model_name Idefics2

Reproducing Paper Figures

We provide visualization scripts used in the paper.

# Figure 2: ROC Panels (Linear & Log-Log)
python visualizations/plot_roc.py

# Figure 3: Cognitive Manifold Fingerprints (2x2 KDE)
python visualizations/plot_manifold.py

# Figure 4: Ablation Study & Synergy Gain
python visualizations/plot_ablation.py

# Figure 6: Robustness to Calibration Contamination
python visualizations/plot_robustness.py

Repository Structure

Anatomy-of-a-Lie/
├── README.md                   # This file
├── requirements.txt            # Python dependencies
├── environment.yml             # Conda environment (optional)
│
├── src/                        # Core algorithm modules
│   ├── metrics/
│   │   └── core_metrics.py     # Information-theoretic probe implementations
│   ├── detector/
│   │   └── cad_gmm.py          # GMM-based cognitive anomaly detector
│   └── utils/
│       └── prompt_utils.py     # Prompt templates and utilities
│
├── scripts/                    # Experiment execution scripts
│   ├── extraction/             # Feature extraction scripts
│   │   ├── extract_llava.py
│   │   ├── extract_idefics2.py
│   │   ├── extract_qwen2.py
│   │   └── extract_deepseek.py
│   └── evaluation/             # Evaluation scripts
│       ├── run_cad_eval.py
│       └── run_ablation.py
│
├── visualizations/             # Plotting scripts
│   ├── plot_roc.py
│   ├── plot_manifold.py
│   ├── plot_ablation.py
│   └── plot_robustness.py
│
└── data/                       # Data directory
    └── README.md               # Instructions for dataset preparation

Supported Models

LLaVA-v1.6-Mistral-7B
Idefics2-8B
Qwen2-VL-7B-Instruct
DeepSeek-VL2-Tiny/Small

Supported Datasets

POPE
MME
MS-COCO

Citation

If you find our work or this codebase helpful, please consider citing our paper:

@article{xiong2026anatomy,
  title={Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models},
  author={Xiong, Lexiang and Li, Qi and Ye, Jingwen and Wang, Xinchao},
  journal={arXiv preprint arXiv:2603.15557},
  year={2026}
}

Acknowledgements

We thank the open-source VLM community, particularly the maintainers of POPE, Llava, Idefics2, Qwen-VL, and DeepSeek-VL for their invaluable resources.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Overview

Key Features

Installation

Environment Variables

Quick Start

1. Extract Cognitive Trajectories

2. Fit CAD Detector and Evaluate

3. Run Ablation Study

Reproducing Paper Figures

Repository Structure

Supported Models

Supported Datasets

Citation

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
scripts		scripts
src		src
visualizations		visualizations
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Overview

Key Features

Installation

Environment Variables

Quick Start

1. Extract Cognitive Trajectories

2. Fit CAD Detector and Evaluate

3. Run Ablation Study

Reproducing Paper Figures

Repository Structure

Supported Models

Supported Datasets

Citation

Acknowledgements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages