Systematic evaluation of Time Series Foundation Models (TSFMs) for Process Model Forecasting (PMF), predicting how directly-follows (DF) relations in a process evolve over time. The repository benchmarks Chronos, Moirai, and TimesFM across zero-shot, LoRA, and full fine-tuning settings on four real-world event logs, using MAE/RMSE alongside Entropic Relevance as a process-aware conformance metric.
Try it live: an interactive forecast explorer and the talk slides, both hosted as Hugging Face Spaces.
- Zero-shot coverage: 12 TSFM variants across Chronos, Moirai, and TimesFM.
- Fine-tuning coverage: LoRA for Chronos-Bolt and Moirai-1.1; full fine-tuning for Chronos-Bolt, Chronos-2, and Moirai-1.1.
- Data assets: daily DF-count time series in Parquet and XES logs for Entropic Relevance evaluation.
- Orchestration: Hydra-driven Python entry points plus local orchestration scripts and VSC HPC helpers.
- Self-host & agents: run zero-shot forecasting on your own log via the Docker image or the headless MCP server.
| Family | Variants |
|---|---|
| Chronos | Bolt Tiny, Bolt Mini, Bolt Small, Bolt Base, Chronos-2 |
| Moirai | 1.1 Small/Large, 2.0 Small, MoE Base |
| TimesFM | 1.0-200M, 2.0-500M, 2.5-200M |
- LoRA experiments in this repo cover
chronos/bolt_small,chronos/bolt_base,moirai/1_1_small, andmoirai/1_1_large. - Full fine-tuning covers
chronos/bolt_small,chronos/bolt_base,chronos/chronos2,moirai/1_1_small, andmoirai/1_1_large.
Four process mining event logs from the BPI Challenge and healthcare domains:
| Dataset | Description | Cases | DFs |
|---|---|---|---|
| bpi2017 | Loan application process | 40,229 | 21 |
| bpi2019_1 | Purchase order process (3-way match) | 197,521 | 149 |
| sepsis | Sepsis clinical pathway | 999 | 135 |
| hospital_billing | Hospital billing process | 78,828 | 73 |
The experiment data assets are published on Zenodo. After extraction, the archive is organized as:
data/
├── raw_logs/ # original XES logs from source benchmarks
├── processed_logs/ # processed XES logs used by ER evaluation
├── time_series/ # daily DF-count Parquet files used by inference/training
└── metadata/ # release metadata and preprocessing statistics
See Data Setup for download commands. If you want the upstream preprocessing workflow and source-log preparation details, see pmf-benchmark.
Requires Python 3.10+ and uv. The timesfm_v25 extra requires Python 3.11+ because the pinned TimesFM 2.5 package is only installed on 3.11+.
# Clone and install
git clone https://github.com/YongboYu/pmf-tsfm.git
cd pmf-tsfm
uv sync
# Optional model extras
uv sync --extra timesfm_v25 # TimesFM 2.5
uv sync --extra timesfm_legacy # TimesFM 1.0 / 2.0
# Optional dev tools
uv sync --group dev
# Optional: activate the uv-managed environment for plain `python -m ...` usage
source .venv/bin/activateExamples below assume either the .venv is activated or commands are prefixed with uv run.
See Tested Environments for the macOS (Apple Silicon / MPS), Linux (NVIDIA GPUs / CUDA), and VSC wICE HPC cluster setups used with this repo.
Create the local config files that are meant to stay machine-specific:
cp .env.example .env
cp configs/local/default.yaml.example configs/local/default.yaml.envis used for environment variables such asPROJECT_ROOT,WANDB_API_KEY,CUDA_VISIBLE_DEVICES, andTIMESFM_V1_PATH.configs/local/default.yamlis optional and useful for Hydra-only overrides such asdevice,training.num_workers, or a local Weights & Biases entity.
Download from the Zenodo record page: https://zenodo.org/records/18327515.
# With zenodo-get
pip install zenodo-get
zenodo_get 10.5281/zenodo.18327515 -o data/Or download the current archive directly:
wget -O data/pmf_data_v1.1.zip \
https://zenodo.org/api/records/18327515/files/pmf_data_v1.1.zip/contentExtract the downloaded archive into data/:
unzip -o data/pmf_data_v1.1.zip -d data/After extraction the data/ directory should contain raw_logs/, processed_logs/, and time_series/ as described in Datasets.
For the reproducible paper workflow, preprocess the Parquet time series once so training and inference share the exact same split boundaries:
python -m pmf_tsfm.data.preprocess --multirun \
data=bpi2017,bpi2019_1,sepsis,hospital_billingThis writes data/processed/{dataset}/full.parquet, train.parquet, val.parquet, test.parquet, and metadata.json, which are then used by training and inference. See Common Workflows for run examples and HPC for the cluster path.
# 1. Zero-shot inference on one model/dataset pair
python -m pmf_tsfm.inference model=chronos/bolt_small data=bpi2017
# 2. Evaluate that output directory
python -m pmf_tsfm.evaluate \
results_dir=outputs/zero_shot/bpi2017/chronos_bolt_small
# 3. Evaluate Entropic Relevance on the same predictions
python -m pmf_tsfm.er.evaluate_er model=chronos/bolt_small data=bpi2017Add logger=wandb or logger=wandb_offline to any Hydra command if you want W&B tracking.
Predictions are written under outputs/{task}/{dataset}/{model}/ and fine-tuned checkpoints / LoRA adapters under results/{task}/{dataset}/{model}/. Both directories are generated per run and git-ignored.
All experiment entry points are Hydra-based.
# Default run (Chronos Bolt Small on bpi2017)
python -m pmf_tsfm.inference
# Single model + dataset
python -m pmf_tsfm.inference model=chronos/bolt_small data=bpi2017
# Sweep multiple combinations
python -m pmf_tsfm.inference --multirun \
model=chronos/bolt_small,chronos/bolt_base \
data=bpi2017,bpi2019_1,sepsis,hospital_billing# LoRA fine-tuning
python -m pmf_tsfm.train \
task=lora_tune model=chronos/bolt_small data=bpi2017 lora=chronos
# Full fine-tuning
python -m pmf_tsfm.train \
task=full_tune model=chronos/bolt_small data=bpi2017# LoRA-adapted inference
python -m pmf_tsfm.inference model=chronos/bolt_small data=bpi2017 \
task=lora_tune lora_adapter_path=results/lora_tune/bpi2017/chronos_bolt_small/lora_adapter/best
# Fully fine-tuned inference
python -m pmf_tsfm.inference model=chronos/bolt_small data=bpi2017 \
task=full_tune checkpoint_path=results/full_tune/bpi2017/chronos_bolt_small/checkpoints/best# Evaluate all zero-shot outputs
python -m pmf_tsfm.evaluate
# Evaluate all LoRA or full-tune outputs
python -m pmf_tsfm.evaluate task=lora_tune
python -m pmf_tsfm.evaluate task=full_tune
# Evaluate a specific model/dataset directory
python -m pmf_tsfm.evaluate \
results_dir=outputs/zero_shot/bpi2017/chronos_bolt_small
# Entropic Relevance on one model/dataset pair
python -m pmf_tsfm.er.evaluate_er model=chronos/bolt_small data=bpi2017# All zero-shot combinations
bash scripts/run_inference_all.sh
# All LoRA train + inference runs
bash scripts/run_lora_all.sh
# All full fine-tune + inference runs
bash scripts/run_full_tune_all.sh
# Batch ER evaluation
bash scripts/run_er_all.sh
# Full 10-stage end-to-end pipeline
bash scripts/run_full_pipeline.shThese are local orchestration scripts: shell helpers for running sequential experiment batches on your workstation or server, without Slurm job submission. The shell scripts source scripts/env.sh, which loads .env and activates .venv automatically when present.
Beyond the research CLI above, the core pipeline ships as two self-host artifacts so you can run zero-shot DF-relation forecasting plus accuracy (MAE / RMSE + Entropic Relevance) on your own process log — a raw .xes/.xes.gz (auto-converted to the daily DF-relation series) or a prepared DF-relation .parquet — with no caps. Both wrap the same Gradio-free seam src/pmf_tsfm/api.py (forecast_backtest / forecast_only / list_models) — a zero-shot holdout backtest that reuses the real cores, so the numbers match the CLI and paper. See the per-artifact READMEs below for setup and design details.
-
Docker — self-host CLI (
docker/README.md). Build the core image and forecast your own log:docker build -f docker/Dockerfile -t pmf-tsfm . docker run --rm -v "$PWD/data:/data" -v pmf-cache:/cache \ pmf-tsfm backtest --input /data/processed_logs/sepsis.xes --model chronos/chronos2
The image also runs the full Hydra CLIs (
inference,evaluate,evaluate_er, ...). Default models are Chronos + Moirai; TimesFM is an opt-in build (--build-arg INSTALL_TIMESFM=1). -
MCP — agent server (
mcp/README.md). A headless FastMCP server exposing the same capability as typed MCP tools:uv sync --extra mcp python mcp/server.py # stdio; connect any MCP client or the MCP Inspector
The capped Gradio demo in demo/ remains the hosted visualization Space; these two artifacts are the uncapped, bring-your-own-data path.
- macOS on Apple Silicon: tested with
device=mpsfor local development and lighter runs. - Linux workstation/server with NVIDIA GPUs: tested with
device=cudaand the localscripts/env.shhelpers. - VSC wICE cluster: tested with Slurm submission scripts under
scripts/hpc/for NVIDIA H100 GPU jobs.
For macOS with MPS, keep training.num_workers=0. For Linux systems with NVIDIA GPUs and for the HPC cluster, higher worker counts such as training.num_workers=4 are the intended path.
Slurm submission scripts for the VSC wICE cluster live under scripts/hpc/. Use scripts/hpc/.env.hpc.example as the cluster-specific starting point.
W&B logging on HPC depends on LOGGER:
bash scripts/hpc/submit_pipeline.shdefaults toLOGGER=disabled.- Direct stage scripts such as
submit_zero_shot.sh,submit_lora.sh, andsubmit_full_tune.shdefault toLOGGER=wandb. - Use
LOGGER=wandb_offlineonly if you want offline runs, then sync them later withbash scripts/hpc/sync_wandb_offline.sh.
bash scripts/hpc/setup_vsc.sh # One-time environment setup
bash scripts/hpc/submit_pipeline.sh # Default: no W&B logging
LOGGER=wandb bash scripts/hpc/submit_pipeline.sh
LOGGER=wandb_offline bash scripts/hpc/submit_pipeline.sh
bash scripts/hpc/submit_zero_shot.sh # Default: LOGGER=wandb for direct stage runs
bash scripts/hpc/sync_wandb_offline.sh # Only after explicit offline runspmf-tsfm/
├── src/pmf_tsfm/ # Python package: model adapters, data modules, evaluation, api.py seam
├── configs/ # Hydra configs for tasks, models, datasets, loggers, paths
├── scripts/ # Local orchestration scripts and HPC helpers
├── docker/ # Self-host image for the core pipeline (see docker/README.md)
├── mcp/ # Headless FastMCP server over the api.py seam (see mcp/README.md)
├── demo/ # Gradio forecast explorer, hosted as a live HF Space (see demo/README.md)
├── tests/ # pytest suite
├── data/ # Zenodo assets plus generated processed splits
├── outputs/ # Generated predictions and evaluation artifacts (git-ignored)
├── results/ # Generated checkpoints and LoRA adapters (git-ignored)
├── notebooks/ # Analysis notebooks
├── manuscript/ # Paper assets
└── slides/ # Slidev talk deck, published as a live HF Space
@article{yu2025time,
title={Time Series Foundation Models for Process Model Forecasting},
author={Yu, Yongbo and Peeperkorn, Jari and De Smedt, Johannes and De Weerdt, Jochen},
journal={arXiv preprint arXiv:2512.07624},
year={2025}
}This project is licensed under the MIT License.