FAIR-RAG: An End-to-End Framework for Mitigating Political Bias through Fair Retrieval-Augmented Generation
Accepted at SIGIR 2026 (The 49th International ACM SIGIR Conference on Research and Development in Information Retrieval), July 20--24, 2026, Melbourne, Australia.
Retrieval-Augmented Generation (RAG) systems can amplify political bias from underlying web corpora. To empirically demonstrate this amplification, we first analyze 16,254 documents from the C4 dataset and 24,300 LLM-generated responses, revealing significant left-leaning and supportive stance bias that can propagate strongly from retrieval to generation. To mitigate this amplification of political bias, we propose FAIR-RAG, an end-to-end framework integrating (1) multi-LLM persona-based annotation, (2) a vector database with political-stance metadata, and (3) a multi-stage fairness engine designed for each of the three stages in RAG systems.
| Metric | FAIR-RAG | Improvement |
|---|---|---|
| AWRF (Attention Weighted Rank Fairness) | 97.51 | 82.1% over SOTA |
| Perspective Balance (GPT / Gemini) | 51.01 / 82.37 | Avg. 5.6% over SOTA |
| Context Precision | 0.972 | Maintained high quality |
| Faithfulness | 0.995 | Maintained high quality |
FAIR-RAG comprises three components that operate collaboratively across the entire RAG pipeline:
FAIR-RAG Pipeline
┌─────────────────┐ ┌─────────────┐ ┌──────────────────────────────┐
│ FAIR-Annotation │→ │ FAIR-KB │→ │ FAIR-Engine │
│ │ │ │ │ │
│ Multi-LLM │ │ ChromaDB + │ │ R-stage A-stage G-stage │
│ Persona-based │ │ Political- │ │ Balanced → Context → Fair │
│ Annotation │ │ Stance │ │ Retrieval Augment Response│
│ (4 personas x │ │ Metadata │ │ (Quota) (Meta) (Guide) │
│ 2 LLMs = 8) │ │ │ │ │
└─────────────────┘ └─────────────┘ └──────────────────────────────┘
Multi-LLM persona-based annotation using 8 annotators (4 personas x 2 LLMs: Claude Sonnet 4, GPT-4.1). Each annotator independently assigns Political Orientation and Topic-specific Stance scores (-1.0 to +1.0) with categorical labels, aggregated via majority voting.
A metadata-augmented vector database built on ChromaDB with HNSW indexing. Documents are chunked (512 tokens, 50-token overlap), embedded via multilingual-e5-large-instruct (1024-dim), and indexed with political-stance metadata for balanced retrieval.
A three-stage fairness engine:
- R-stage (Balanced Retrieval): Quota-based retrieval across 9 perspective categories (3 orientations x 3 stances), ensuring equal representation.
- A-stage (Awareness Context Augmentation): Re-ranks documents by relevance and constructs politically-aware context with metadata summaries and annotated documents.
- G-stage (Fair Response Generation): Injects fairness guidelines into the system prompt, directing the LLM to produce balanced multi-perspective responses.
Fair-RAG/
├── 0_annotation/ # FAIR-Annotation pipeline
│ ├── annotation_process.py # Multi-persona annotation with LLMs
│ ├── c4_collection.py # C4 dataset collection and filtering
│ ├── chatgpt/ # ChatGPT API integration
│ ├── claude/ # Claude API integration
│ └── prompt/ # Persona-based prompt templates
├── 1_doc2vec/ # FAIR-KB construction
│ ├── ingest_csv_to_chroma.py # CSV to ChromaDB ingestion
│ └── check_query2db.py # Database query validation
├── 2_rag/ # FAIR-Engine implementation
│ └── OUR_rag_system.py # Core RAG system with R-A-G fairness
├── querylist/ # Query datasets for evaluation
│ ├── debate_questions_general.csv
│ ├── debate_questions_oppose.csv
│ └── debate_questions_support.csv
├── topics-questions.csv # 15 politically sensitive topics and keywords
├── c4_analysis_summary.csv # C4 corpus bias analysis results
├── metric_base_test_with_gpt.py # Baseline evaluation (GPT judge)
├── metric_base_test_with_gemini.py # Baseline evaluation (Gemini judge)
├── metric_fairness_test_with_gpt.py # Fairness evaluation (GPT judge)
├── metric_fairness_test_with_gemini.py # Fairness evaluation (Gemini judge)
└── test_ablation_study.py # Ablation study framework
- Python 3.8+
- API keys for OpenAI (GPT-4.1) and Anthropic (Claude Sonnet 4)
- Ollama for local LLM inference (llama3.1, qwen3, gemma3, gpt-oss)
pip install chromadb sentence-transformers openai anthropicStep 1: Annotate documents with political-stance metadata
cd 0_annotation
python annotation_process.pyStep 2: Build the FAIR-KB vector database
cd 1_doc2vec
python ingest_csv_to_chroma.pyStep 3: Run FAIR-RAG
cd 2_rag
python OUR_rag_system.pyStep 4: Evaluate
# Baseline evaluation
python metric_base_test_with_gpt.py
python metric_base_test_with_gemini.py
# Fairness evaluation
python metric_fairness_test_with_gpt.py
python metric_fairness_test_with_gemini.py
# Ablation study
python test_ablation_study.py@inproceedings{you2026fairrag,
title={FAIR-RAG: An End-to-End Framework for Mitigating Political Bias through Fair Retrieval-Augmented Generation},
author={You, Jaebeom and Lee, Kisung and Kwon, Hyuk-Yoon},
booktitle={Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '26)},
year={2026},
address={Melbourne, Australia},
publisher={ACM}
}- Jaebeom You - Graduate School of Data Science, Seoul National University of Science and Technology
- Kisung Lee - Division of Computer Science, Louisiana State University
- Hyuk-Yoon Kwon* - Graduate School of Data Science, Seoul National University of Science and Technology; College of Computing, Georgia Institute of Technology
*Corresponding Author
This project is licensed under the MIT License - see the LICENSE file for details.