Vision-Language Models (VLMs) frequently hallucinate. Current evaluation methods treat these errors as static, monolithic failures. In this work, we propose a paradigm shift: viewing hallucination as a dynamic pathology within a model's computational cognition.
We introduce Cognitive Anomaly Detection (CAD), a framework that projects a VLM's generative process onto an interpretable, low-dimensional Cognitive State Space using three novel information-theoretic probes:
-
Perceptual Instability (
$H_{Evi}$ ) - Measures uncertainty in evidence tokens -
Logical-Causal Failure (
$S_{Conf}$ ) - Captures inferential conflict between vision and text -
Decisional Ambiguity (
$H_{Ans}$ ) - Measures uncertainty in final answer
By leveraging geometric-information duality, CAD diagnoses hallucinations as geometric anomalies with high information-theoretic surprisal, requiring only a single generation pass and weak supervision (no token-level hallucination labels needed).
- Mechanistic Diagnosis: Uncovers distinct failure modes (e.g., Computational Cognitive Dissonance, Transparent Struggles, Entangled States)
- High Efficiency: Single-pass generation + lightweight non-autoregressive replay
- Weak Supervision: Calibrates on a small set of ground-truth answers (resilient to up to 30% calibration contamination)
- State-of-the-Art: Achieves superior AUC across POPE, MME, and MS-COCO on models like Idefics2, Llava-v1.6, Qwen2-VL, and DeepSeek-VL2
# Clone the repository
git clone https://github.com/Lexiang-Xiong/CAD
cd Anatomy-of-a-Lie
# Create conda environment
conda create -n cad-vlm python=3.10 -y
conda activate cad-vlm
# Install PyTorch (adjust CUDA version as needed)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Install requirements
pip install -r requirements.txt
# For Qwen2-VL support
pip install qwen-vl-utils
# For DeepSeek-VL2 support
pip install deepseek_vl2Set your HuggingFace token for accessing gated models:
export HF_TOKEN="your_huggingface_token"Run the extraction scripts to generate responses, perform text-only replay, and calculate the core metrics (
# Idefics2
python scripts/extraction/extract_idefics2.py \
--dataset lmms-lab/POPE \
--output_dir results/idefics2
# LLaVA
python scripts/extraction/extract_llava.py \
--dataset lmms-lab/POPE \
--output_dir results/llava
# Qwen2-VL
python scripts/extraction/extract_qwen2.py \
--dataset lmms-lab/POPE \
--output_dir results/qwen2
# DeepSeek-VL2
python scripts/extraction/extract_deepseek.py \
--dataset lmms-lab/POPE \
--output_dir results/deepseekCalibrate the GMM on the nominal state manifold and predict hallucinations on the test set.
# Basic evaluation
python scripts/evaluation/run_cad_eval.py \
--input_file results/idefics2/hallucination_metrics_full.csv \
--model_name Idefics2 \
--n_components 7
# Automatic K selection using BIC
python scripts/evaluation/run_cad_eval.py \
--input_file results/idefics2/hallucination_metrics_full.csv \
--model_name Idefics2 \
--auto_kEvaluate the contribution of each metric component.
python scripts/evaluation/run_ablation.py \
--input_file results/idefics2/hallucination_metrics_full.csv \
--model_name Idefics2We provide visualization scripts used in the paper.
# Figure 2: ROC Panels (Linear & Log-Log)
python visualizations/plot_roc.py
# Figure 3: Cognitive Manifold Fingerprints (2x2 KDE)
python visualizations/plot_manifold.py
# Figure 4: Ablation Study & Synergy Gain
python visualizations/plot_ablation.py
# Figure 6: Robustness to Calibration Contamination
python visualizations/plot_robustness.pyAnatomy-of-a-Lie/
├── README.md # This file
├── requirements.txt # Python dependencies
├── environment.yml # Conda environment (optional)
│
├── src/ # Core algorithm modules
│ ├── metrics/
│ │ └── core_metrics.py # Information-theoretic probe implementations
│ ├── detector/
│ │ └── cad_gmm.py # GMM-based cognitive anomaly detector
│ └── utils/
│ └── prompt_utils.py # Prompt templates and utilities
│
├── scripts/ # Experiment execution scripts
│ ├── extraction/ # Feature extraction scripts
│ │ ├── extract_llava.py
│ │ ├── extract_idefics2.py
│ │ ├── extract_qwen2.py
│ │ └── extract_deepseek.py
│ └── evaluation/ # Evaluation scripts
│ ├── run_cad_eval.py
│ └── run_ablation.py
│
├── visualizations/ # Plotting scripts
│ ├── plot_roc.py
│ ├── plot_manifold.py
│ ├── plot_ablation.py
│ └── plot_robustness.py
│
└── data/ # Data directory
└── README.md # Instructions for dataset preparation
- LLaVA-v1.6-Mistral-7B
- Idefics2-8B
- Qwen2-VL-7B-Instruct
- DeepSeek-VL2-Tiny/Small
If you find our work or this codebase helpful, please consider citing our paper:
@article{xiong2026anatomy,
title={Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models},
author={Xiong, Lexiang and Li, Qi and Ye, Jingwen and Wang, Xinchao},
journal={arXiv preprint arXiv:2603.15557},
year={2026}
}We thank the open-source VLM community, particularly the maintainers of POPE, Llava, Idefics2, Qwen-VL, and DeepSeek-VL for their invaluable resources.
This project is licensed under the MIT License - see the LICENSE file for details.