This repository accompanies the paper "LITcoder: A General-Purpose Library for Building and Comparing Encoding Models" (arXiv:2509.09152). It provides a modular pipeline to align continuous stimuli with fMRI data, build encoding models, and evaluate them.
The steps below prepare your environment and data to reproduce experiments on three story-listening datasets: Narratives, Little Prince (LPP), and LeBel.
- Python 3.10+
- fMRIPrep
- BIDS-formatted datasets (Narratives, Little Prince, LeBel)
Environment setup:
git clone git@github.com:GT-LIT-Lab/litcoder_core.git
cd litcoder_core
conda create -n litcoder -y python=3.12.8
conda activate litcoder
conda install pip
pip install -e .Use the following template for each participant and dataset:
fmriprep BIDS_DIR OUTPUT_DIR participant \
--participant_label PARTICIPANT \
--output-spaces MNI152NLin2009cAsym:res-2 \
--ignore slicetiming \
--skip_bids_validation \
--fs-no-reconall \
--nprocs 8- Replace
BIDS_DIR,OUTPUT_DIR, andPARTICIPANTper your setup. - After preprocessing, copy or symlink the required outputs into this repo under
data/<dataset>/neural_data/following the layouts below.
- Narratives:
data/narratives/neural_data/ - Little Prince:
data/little_prince/neural_data/ - LeBel:
data/lebel/neural_data/
Unless specified, each subject has a subfolder named by the subject label (e.g., sub-256, sub-EN058).
To keep this README concise, each dataset has its own short guide in docs:
See: docs/narratives.md
Key requirements:
- BOLD NIfTI in
MNI152NLin2009cAsym:res-2 - narratives_data.pkl under
data/narratives/neural_data/ - Source for
narratives_data.pkl: data/narratives/neural_data/narratives_data.pkl
Key requirements:
- BOLD NIfTI for all runs in
MNI152NLin2009cAsym:res-2 - Per-subject
lpp_data.pklunderdata/little_prince/neural_data/ - Source for
lpp_data.pkl: data/little_prince/neural_data/lpp_data.pkl
See: docs/lebel.md
Key requirements:
- Run
fmri_processing/to_mni_lebel.pyafter fMRIPrep - Place per-subject pickles directly under
data/lebel/neural_data/:noslice_sub-<SUBJECT_ID>_story_data.pklnoslice_sub-<SUBJECT_ID>_story_data_surface.pkl
- Source for
lebel_data.pkl: data/lebel/neural_data/lebel_data.pkl
Under scripts/, we provide SLURM and bash scripts to run the experiments.
You'll find the scripts for narratives under scripts/narratives_scripts/.
For lpp, you'll find the scripts under scripts/lpp_scripts/.
For lebel, they will be under scripts
This step is very easy to do and is done by the following commands:
cd generate_figures
conda activate litcoder
bash run_zip_py.shThis will generate the figures in the paper under analysis_litcoder_figures.
This library uses the same core modules across datasets, but here’s the exact flow for Narratives as implemented in training_files/train_narratives.py:
- Create an assembly with
dataset_type="narratives", providingdata_dir,subject,tr,lookback,context_type, anduse_volume. - Build
LanguageModelFeatureExtractor, extract all layers per story (with caching viaActivationCache), and selectlayer_idx. - Downsample to TRs with
Downsampler.downsample(...)using--downsample_methodand Lanczos parameters if applicable. - Apply FIR delays with
FIR.make_delayedusing--ndelays. - Concatenate the delayed features and brain data for the story order used in the script (currently
['21styear']), then trim[14:-9]before modeling. - Fit nested CV ridge via
fit_nested_cv(features=X, targets=Y, ...)with your chosen folding and ridge parameters.
Concrete CLI example:
python training_files/train_narratives.py \
--data_dir data/narratives/neural_data \
--subject sub-256 \
--tr 1.5 \
--model_name gpt2-small \
--layer_idx 9 \
--context_type fullcontext \
--downsample_method lanczos \
--lanczos_window 3 \
--lanczos_cutoff_mult 1.0 \
--ndelays 8 \
--folding_type kfold \
--n_outer_folds 5 \
--n_inner_folds 5 \
--chunk_length 20 \
--singcutoff 1e-10 \
--logger_backend tensorboard \
--results_dir results/narratives_demoWe tested the following training scripts on a clean setup using the steps above:
| Dataset | Script | Status | Results |
|---|---|---|---|
| LeBel | train_lebel.py |
✅ | View Results |
| LeBel | train_lebel_speech.py |
✅ | View Results |
| LeBel | train_lebel_embeddings.py |
✅ | View Results |
| LeBel | train_lebel_wordrate.py |
✅ | View Results |
| Narratives | train_narratives.py |
✅ | View Results |
| Narratives | train_narratives_embeddings.py |
✅ | View Results |
| Narratives | train_narratives_speech.py |
✅ | View Results |
| Narratives | train_narratives_wordrate.py |
✅ | View Results |
| Little Prince | train_lpp_all.py |
✅ | View Results |
| Little Prince | train_lpp_speech.py |
✅ | View Results |
| Little Prince | train_lpp_embeddings.py |
✅ | View Results |
| Little Prince | train_lpp_wordrate.py |
✅ | View Results |
Please refer to the original LITcoder for any questions, documentation and more. Documentation for the LITcoder core library can be found here. Tutorials for the LITcoder library can be found here.
If you use LITcoder in your research, please cite:
@misc{binhuraib2025litcodergeneralpurposelibrarybuilding,
title={LITcoder: A General-Purpose Library for Building and Comparing Encoding Models},
author={Taha Binhuraib and Ruimin Gao and Anna A. Ivanova},
year={2025},
eprint={2509.09152},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.09152},
}