Skip to content

Taleef7/semeval-2026-task8

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PFW Task 8 at SemEval-2026 Task 8

SemEval-2026 Task%208 MTRAGEval Paper%20PDF Status

Lightweight tri-fusion retrieval with prompt-engineered faithful generation for multi-turn RAG.

This repository contains the code, paper, and reproducibility utilities for our SemEval-2026 Task 8 (MTRAGEval) system. The public GitHub version is intentionally focused on the material needed to understand, reproduce, and extend the paper. It excludes benchmark data, built indices, cached embeddings, generated submissions, logs, and local research artifacts.

Paper: paper/main.pdf
Paper source: paper/main.tex
Documentation index: docs/README.md
Citation metadata: CITATION.cff

Official Results

Task Metric Score Rank
A nDCG@5 0.433 20/38
B H-mean (RB_agg, RL_F, RB_llm) 0.756 6/26
C H-mean (RB_agg, RL_F, RB_llm) 0.533 14/29

Repository Scope

Included:

  • source code for retrieval, generation, and submission pipelines
  • camera-ready analysis scripts used for the final paper revision
  • SLURM job scripts used on Purdue Gilbreth
  • LaTeX source and compiled camera-ready PDF
  • lightweight documentation for setup and reproduction

Intentionally excluded:

  • benchmark corpora and task files
  • generated indices, embeddings, and cached model outputs
  • evaluation logs, intermediate results, and local notebooks
  • organizer-provided private analytics files and confidential evaluation artifacts

Repository Layout

.
├── src/mtrageval/          # Python package: retrieval, generation, analysis helpers
├── scripts/
│   ├── setup/             # Index and cache construction
│   ├── submission/        # Official Task A/B/C generation pipelines
│   ├── analysis/          # Retrieval, prompt, and camera-ready analyses
│   ├── evaluation/        # Local evaluation utilities
│   └── validation/        # Format and sanity checks
├── tests/                 # Unit tests for camera-ready analysis code
├── slurm/                 # Gilbreth cluster job scripts
├── paper/                 # ACL/SemEval paper source and PDF
├── docs/                  # Reproducibility and repository documentation
├── requirements.txt
└── README.md

Quick Start

1. Environment

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Benchmark Data

Download the MTRAGEval benchmark from the official repository:

The public repo assumes the following local layout after download:

data/
├── corpora/
│   ├── clapnq.jsonl
│   ├── cloud.jsonl
│   ├── fiqa.jsonl
│   └── govt.jsonl
├── rag_taskAC.jsonl
└── reference_taskB.jsonl

Some camera-ready analyses also expect a local benchmark checkout under:

external/mt-rag-benchmark/

3. API Configuration

Task B and Task C generation require an OpenAI API key:

export OPENAI_API_KEY="your-key"

Reproducing the System

Build Retrieval Indices

python scripts/setup/build_bm25_indices.py
python scripts/setup/build_splade_indices.py
python scripts/setup/build_jina_v4_index.py

Generate Official-Format Submissions

Task A:

python scripts/submission/generate_taska_submission.py \
  --input-file data/rag_taskAC.jsonl \
  --output-file "PFW Task 8_taskA.jsonl" \
  --team-name "PFW Task 8"

Task B:

python scripts/submission/generate_taskb_v2.py \
  --input-file data/reference_taskB.jsonl \
  --output-file "PFW Task 8_taskB.jsonl" \
  --model gpt-4o

Task C:

python scripts/submission/generate_taskc_submission.py \
  --input-file data/rag_taskAC.jsonl \
  --output-file "PFW Task 8_taskC.jsonl" \
  --team-name "PFW Task 8"

Reproducing Camera-Ready Analyses

The final paper uses the following analysis scripts:

PYTHONPATH=src python scripts/analysis/camera_ready_retrieval.py --repo-root .
PYTHONPATH=src python scripts/analysis/camera_ready_prompt_ablation.py --repo-root .
PYTHONPATH=src python scripts/analysis/camera_ready_taskc_control.py --repo-root .
PYTHONPATH=src python scripts/analysis/camera_ready_summary.py --repo-root .

Detailed notes are in docs/reproducibility.md.

Testing

Run the analysis test suite with:

PYTHONPATH=src python -m unittest discover -s tests -p 'test_camera_ready*.py'

Compute Context

  • Retrieval indexing and large-scale analyses were run on Purdue Gilbreth.
  • A100-80GB GPUs were used for dense/sparse indexing.
  • A30 GPUs were used for later camera-ready analyses.
  • Task B and Task C generation use the OpenAI API.

Citation

@inproceedings{tamsal2026pfw,
  title     = {{PFW Task 8} at {SemEval}-2026 Task 8: Lightweight Tri-Fusion Retrieval with Prompt-Engineered Faithful Generation for Multi-Turn {RAG}},
  author    = {Tamsal, Taleef and Rusert, Jonathan},
  booktitle = {Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026)},
  year      = {2026},
  address   = {San Diego, California},
  organization = {Association for Computational Linguistics}
}

External References

Contact

  • Taleef Tamsal: tamst01@pfw.edu
  • Jonathan Rusert: jrusert@pfw.edu

Releases

No releases published

Packages

 
 
 

Contributors