kreview is a production-grade, notebook-first (nbdev) evaluation engine designed for high-throughput cancer liquid biopsy fragmentomics feature analysis. Developed at Memorial Sloan Kettering (MSKCC), it processes cohorts containing tens of thousands of samples using an embedded DuckDB query engine with chunked I/O and automatic retry logic.
- 5-Tier ctDNA Taxonomy: MSK-IMPACT paired-inference to label
True ctDNA+,Possible ctDNA+,Possible ctDNA−,Healthy Normal, andInsufficient Data. - DuckDB Dynamic Data Lake: In-memory
read_parquetbindings with chunked I/O and exponential backoff retry. Builds a merged SQL-queryablekreview_lake.duckdbon demand. - Multi-Model Evaluation: Random Forest, XGBoost, and Logistic Regression with Stratified K-Fold CV, SHAP explainability, and subgroup analysis.
- Interactive Dashboards: Plotly-native HTML reports with ROC curves, violin plots, SHAP beeswarm/waterfall, and per-cancer-type sensitivity tables.
- 26 Built-In Evaluators: Modular extractors covering fragment sizes (FSC, FSD, FSR), nucleosome protection (WPS, TFBS), cleavage motifs (EndMotif, BreakPointMotif), chromatin accessibility (ATAC), motif divergence (MDS), and orientation (OCF).
Important
Quarto is strictly required for programmatic dashboard generation. Because quarto-cli wrapper packages are unreliable across Python environments, kreview assumes the Quarto executable is installed dynamically on your OS or container.
The easiest way to run kreview without managing external dependencies is to use our pre-built Docker container (hosted on GHCR). It natively ships with Python 3.12, all ML libraries, and the underlying quarto linux binaries configured flawlessly:
docker pull ghcr.io/msk-access/kreview:latest
docker run -v /your/data:/data ghcr.io/msk-access/kreview:latest \
kreview run --cancer-samplesheet /data/cancer.csv ...If you install via pip, you must separately install Quarto via your OS manager:
- Install Quarto: Follow the official Quarto Installation Guide (e.g.
brew install quartoon macOS). - Install kreview:
git clone https://github.com/msk-access/kreview.git
cd kreview
pip install -e .PYTHONUNBUFFERED=1 kreview run \
--cancer-samplesheet "/path/to/cancer/samplesheet.csv" \
--healthy-xs1-samplesheet "/path/to/healthy/xs1/samplesheet.csv" \
--healthy-xs2-samplesheet "/path/to/healthy/xs2/samplesheet.csv" \
--cbioportal-dir "/path/to/cBioPortal_MAF_CNA_SV/" \
--krewlyzer-dir "/path/to/unified_krewlyzer_results" \
--output output/ \
--workers 4 \
--export-duckdbOnce finished, open the generated HTML reports:
open output/reports/ATAC_dashboard.htmlThis project operates as an nbdev repo. Do not edit .py scripts manually in kreview/. Build natively inside Jupyter notebooks within nbs/ and trigger:
nbdev-export- Documentation — Full user and developer guide
- Contributing — How to contribute
- Changelog — Version history