Skip to content

cltl/MultiClinNER-2026

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MultiClinNER_2026

This repository contains the code and GPT annotations for the CLTL's submission (team LotusOrchid) to the NER subtask of the MultiClinAI shared task.

Setup

Download data

curl https://zenodo.org/records/18772832/files/MultiClinAI-training_data_v1.1-260225.zip?download=1 -o data/MultiClinAI-training_data_v1.1-260225.zip
curl https://zenodo.org/records/19098018/files/MultiClinAI-training+NER_test_bg_v1.2_260318.zip?download=1 -o data/MultiClinAI-training+NER_test_bg_v1.2_260318.zip  
cd data
unzip MultiClinAI*.zip
unzip ann_gpt.zip

Python environment

This project uses uv.

Run scripts with:

uv run script.py

Code

Dataset extraction

Extract untokenized datasets from annotations:

uv run src/preprocess.py

(see ./dvc.yaml for example calls)

Or extract all datasets with DVC

dvc repro

Finetuning

The main script for finetuning is ./src/main.py. See fitting configuration files in ./cfg/fitting*.yaml, and ./scripts/finetune*.sh for example calls.

Training

The main script for training is ./src/main.py. The main difference with finetuning is that we save a checkpoint at the end. See training configuration files in ./cfg/training*.yaml, and ./scripts/train*.sh for example calls.

Predicting

The main script for predicting is ./src/predict.py. See ./cfg/predict*.example for configuration files, and ./scripts/predict*.sh for example calls.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors