PFW Task 8 at SemEval-2026 Task 8

Lightweight tri-fusion retrieval with prompt-engineered faithful generation for multi-turn RAG.

This repository contains the code, paper, and reproducibility utilities for our SemEval-2026 Task 8 (MTRAGEval) system. The public GitHub version is intentionally focused on the material needed to understand, reproduce, and extend the paper. It excludes benchmark data, built indices, cached embeddings, generated submissions, logs, and local research artifacts.

Paper: paper/main.pdf
Paper source: paper/main.tex
Documentation index: docs/README.md
Citation metadata: CITATION.cff

Official Results

Task	Metric	Score	Rank
A	nDCG@5	0.433	20/38
B	H-mean (RB_agg, RL_F, RB_llm)	0.756	6/26
C	H-mean (RB_agg, RL_F, RB_llm)	0.533	14/29

Repository Scope

Included:

source code for retrieval, generation, and submission pipelines
camera-ready analysis scripts used for the final paper revision
SLURM job scripts used on Purdue Gilbreth
LaTeX source and compiled camera-ready PDF
lightweight documentation for setup and reproduction

Intentionally excluded:

benchmark corpora and task files
generated indices, embeddings, and cached model outputs
evaluation logs, intermediate results, and local notebooks
organizer-provided private analytics files and confidential evaluation artifacts

Repository Layout

.
├── src/mtrageval/          # Python package: retrieval, generation, analysis helpers
├── scripts/
│   ├── setup/             # Index and cache construction
│   ├── submission/        # Official Task A/B/C generation pipelines
│   ├── analysis/          # Retrieval, prompt, and camera-ready analyses
│   ├── evaluation/        # Local evaluation utilities
│   └── validation/        # Format and sanity checks
├── tests/                 # Unit tests for camera-ready analysis code
├── slurm/                 # Gilbreth cluster job scripts
├── paper/                 # ACL/SemEval paper source and PDF
├── docs/                  # Reproducibility and repository documentation
├── requirements.txt
└── README.md

Quick Start

1. Environment

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Benchmark Data

Download the MTRAGEval benchmark from the official repository:

https://github.com/IBM/mt-rag-benchmark

The public repo assumes the following local layout after download:

data/
├── corpora/
│   ├── clapnq.jsonl
│   ├── cloud.jsonl
│   ├── fiqa.jsonl
│   └── govt.jsonl
├── rag_taskAC.jsonl
└── reference_taskB.jsonl

Some camera-ready analyses also expect a local benchmark checkout under:

external/mt-rag-benchmark/

3. API Configuration

Task B and Task C generation require an OpenAI API key:

export OPENAI_API_KEY="your-key"

Reproducing the System

Build Retrieval Indices

python scripts/setup/build_bm25_indices.py
python scripts/setup/build_splade_indices.py
python scripts/setup/build_jina_v4_index.py

Generate Official-Format Submissions

Task A:

python scripts/submission/generate_taska_submission.py \
  --input-file data/rag_taskAC.jsonl \
  --output-file "PFW Task 8_taskA.jsonl" \
  --team-name "PFW Task 8"

Task B:

python scripts/submission/generate_taskb_v2.py \
  --input-file data/reference_taskB.jsonl \
  --output-file "PFW Task 8_taskB.jsonl" \
  --model gpt-4o

Task C:

python scripts/submission/generate_taskc_submission.py \
  --input-file data/rag_taskAC.jsonl \
  --output-file "PFW Task 8_taskC.jsonl" \
  --team-name "PFW Task 8"

Reproducing Camera-Ready Analyses

The final paper uses the following analysis scripts:

PYTHONPATH=src python scripts/analysis/camera_ready_retrieval.py --repo-root .
PYTHONPATH=src python scripts/analysis/camera_ready_prompt_ablation.py --repo-root .
PYTHONPATH=src python scripts/analysis/camera_ready_taskc_control.py --repo-root .
PYTHONPATH=src python scripts/analysis/camera_ready_summary.py --repo-root .

Detailed notes are in docs/reproducibility.md.

Testing

Run the analysis test suite with:

PYTHONPATH=src python -m unittest discover -s tests -p 'test_camera_ready*.py'

Compute Context

Retrieval indexing and large-scale analyses were run on Purdue Gilbreth.
A100-80GB GPUs were used for dense/sparse indexing.
A30 GPUs were used for later camera-ready analyses.
Task B and Task C generation use the OpenAI API.

Citation

@inproceedings{tamsal2026pfw,
  title     = {{PFW Task 8} at {SemEval}-2026 Task 8: Lightweight Tri-Fusion Retrieval with Prompt-Engineered Faithful Generation for Multi-Turn {RAG}},
  author    = {Tamsal, Taleef and Rusert, Jonathan},
  booktitle = {Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026)},
  year      = {2026},
  address   = {San Diego, California},
  organization = {Association for Computational Linguistics}
}

External References

Benchmark homepage: https://ibm.github.io/mt-rag-benchmark/MTRAGEval/
Benchmark repository: https://github.com/IBM/mt-rag-benchmark
SemEval-2026: https://semeval.github.io/

Contact

Taleef Tamsal: tamst01@pfw.edu
Jonathan Rusert: jrusert@pfw.edu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PFW Task 8 at SemEval-2026 Task 8

Official Results

Repository Scope

Repository Layout

Quick Start

1. Environment

2. Benchmark Data

3. API Configuration

Reproducing the System

Build Retrieval Indices

Generate Official-Format Submissions

Reproducing Camera-Ready Analyses

Testing

Compute Context

Citation

External References

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
docs		docs
paper		paper
scripts		scripts
slurm		slurm
src/mtrageval		src/mtrageval
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PFW Task 8 at SemEval-2026 Task 8

Official Results

Repository Scope

Repository Layout

Quick Start

1. Environment

2. Benchmark Data

3. API Configuration

Reproducing the System

Build Retrieval Indices

Generate Official-Format Submissions

Reproducing Camera-Ready Analyses

Testing

Compute Context

Citation

External References

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages