CellZarr: High-Content Image Analysis Pipeline for Live Imaging Timelapse Experiments

CellZarr is a comprehensive and reproducible workflow for high-content image analysis of live cell imaging timelapse experiments. The pipeline processes microscopy data from raw ND2 files to quantitative feature extraction, enabling downstream biological analysis using modern, scalable tools and formats.

Pipeline Overview

The workflow is designed to be highly modular and consists of the following main steps:

ND2 to OME-Zarr conversion: Convert raw ND2 microscopy files to the OME-Zarr format for scalable, cloud-ready storage and analysis.
Colony segmentation using ConvPaint: Identify and segment stem cell colonies in the images using a deep learning-based approach.
Nucleus segmentation using StarDist / Cellpose: Detect and segment individual nuclei within colonies for single-cell analysis.
Cell Tracking: Track individual cells over time to study dynamic behaviors using ultrack or trackastra.
Feature Extraction: Quantify spatial features and extract relevant biological markers (e.g., ERK, Oct4) for each cell.

The workflow is highly modular, making it straightforward to adapt to different datasets or analysis needs. Once the ND2 files have been converted to OME-Zarr, the subsequent steps can be performed independently, allowing you to skip or repeat steps as required for your analysis.

Key Features

Scalable Processing: Extensive use of Dask for parallel and distributed computing
Modern Data Format: OME-Zarr for efficient storage and cloud compatibility
Interactive Visualization: Custom Napari-based data viewer for exploring results
Configurable: Easy configuration through settings files

Installation and Setup

Prerequisites

uv package manager

Note: Thanks to using uv as a package manager, you don't need to install Python or Conda separately. uv automatically manages Python installations and virtual environments for you.

Quick Setup

Clone the repository:

git clone https://github.com/pertzlab/CellZarr
cd CellZarr

Install all recommanded dependencies:
```
uv sync --extra aio_gpu
```
if you have a GPU with cuda 12.8. or
```
uv sync --extra aio_cpu
```
if you are on a CPU system.

This command automatically creates a virtual environment (.venv) in your project directory and installs cellpose for segmentation, as well as trackastra for tracking. This virtual environment can be used directly in VS Code for running the Jupyter notebooks.

Selective Installation

If you don't need all pipeline components, you can install only specific extras:

For StarDist nucleus segmentation only:

uv sync --extra stardist_seg

For Cellpose nucleus segmentation only:

uv sync --extra cellpose_seg_cpu

For Cellpose nucleus segmentation on GPU only:

uv sync --extra cellpose_seg_cuda128

For trackastra segmentation on CPU only:

uv sync --extra trackastra_cpu

For Cellpose nucleus segmentation and trackastra tracking with Cuda (GPU) support:

uv sync --extra cellpose_seg_cuda128 --extra trackastra_cuda128

For ConvPaint colony segmentation only:

uv sync --extra colony_seg
uv pip install pyqt5  # Required on Windows

Base installation (without segmentation models):

uv sync

Multiple extras can be installed together:

uv sync --extra stardist_seg --extra colony_seg

If you are using ultrack for tracking, you do not need to use the uv sync commands (except for the base pipeline), as it will be executed using uv run that creates a temporary virtual environment.

Virtual Environment Integration: All uv sync commands automatically create and manage a virtual environment (.venv) in your project directory. VS Code will automatically detect this environment and can use it for running Jupyter notebooks and Python scripts.

Configuration

Configuration options such as output paths and scaling parameters can be easily adjusted in:

configuration/settings.py - General pipeline settings
configuration/dask.py - Dask cluster configuration

Pipeline Components

1. ND2 to OME-Zarr Conversion (`01_ND2_to_OME-ZARR.ipynb`)

The first step converts raw ND2 microscopy files into the OME-Zarr format. ND2 is a proprietary file format commonly used for storing high-content microscopy data. OME-Zarr is an open, scalable, and cloud-compatible format that enables efficient storage, access, and analysis of large multidimensional image datasets.

In this step, the ND2 file is loaded, relevant metadata is extracted, and each field of view (FOV) is saved as a separate OME-Zarr dataset.

2. Colony Segmentation (`02_Colony_Segmentation.ipynb`)

Uses ConvPaint, a deep learning-based approach, to identify and segment stem cell colonies in the images. This step requires the colony_seg extra dependencies.

3. Nucleus Segmentation (`03_Nucleus_Segmentation.ipynb`)

Employs StarDist to detect and segment individual nuclei within colonies for single-cell analysis. This enables downstream single-cell feature extraction and tracking.

4. Cell Tracking

Cell tracking is implemented using ultrack and can be executed in two ways:

Python Wrapper (`04_Cell_Tracking_ultrack.py` and `04_Cell_Tracking_ultrack_all_fovs.py`)

Direct execution through Python scripts
Suitable for local processing or smaller datasets

SLURM Batch Processing (`04_Cell_Tracking_slurm.sh`)

For high-performance computing environments
Recommended for large datasets with many FOVs
Requires SLURM workload manager

Gurobi License Recommendation: For optimal performance, we recommend obtaining a Gurobi license:

Free for academic use at gurobi.com
Best compatibility with WSL license: Web License Service

5. Feature Extraction (`05_Feature_Extraction_ERK_Oct4.ipynb`)

Quantifies spatial features and extracts relevant biological markers (e.g., ERK, Oct4) for each cell, enabling downstream biological analysis.

Data Visualization

The data_viewer/ folder contains a specialized Napari-based viewer for visualizing the generated OME-Zarr data. This viewer supports:

Multi-channel OME-Zarr visualization
Custom grayscale metadata tags for enhanced display
Label and tracking data overlay
Interactive navigation between experiments and FOVs

Important Note on Custom OME-Zarr Labels: The pipeline generates OME-Zarr files with some labels stored as grayscale images (e.g., biosensor expression overlays) rather than standard categorical labels. While these files can be loaded in Python without issues, the standard OME-Zarr Napari plugin may not properly display these custom grayscale labels (these labels have an additional metadata tag called greyscale set to Tr). Therefore, we strongly recommend using the integrated data viewer provided in this repository, which is specifically designed to handle these custom label types correctly.

Running the Data Viewer

cd data_viewer
uv run data_viewer.py [path_to_data]

See data_viewer/README.MD for detailed usage instructions.

Parallel Processing with Dask

The pipeline extensively uses Dask to parallelize operations for maximum performance:

Distributed Computing: Automatic scaling across available CPU cores
Memory Management: Efficient handling of large datasets
Progress Monitoring: Real-time progress tracking for long-running operations
Configurable: Cluster settings can be adjusted in configuration/dask.py

File Structure

CellZarr/
├── 01_ND2_to_OME-ZARR.ipynb          # ND2 to OME-Zarr conversion
├── 02_Colony_Segmentation.ipynb       # Colony segmentation with ConvPaint
├── 03_Nucleus_Segmentation.ipynb      # Nucleus segmentation with StarDist
├── 04_Cell_Tracking_ultrack.py        # Cell tracking (single FOV)
├── 04_Cell_Tracking_ultrack_all_fovs.py # Cell tracking (all FOVs)
├── 04_Cell_Tracking_slurm.sh          # SLURM batch job for tracking
├── 05_Feature_Extraction_ERK_Oct4.ipynb # Feature extraction
├── pyproject.toml                      # Project dependencies
├── uv.lock                            # Dependency lock file
├── configuration/                      # Configuration files
│   ├── settings.py                    # General settings
│   └── dask.py                        # Dask cluster configuration
├── data_viewer/                       # Napari-based data viewer
│   ├── data_viewer.py                 # Main viewer application
│   └── README.MD                      # Viewer documentation
├── helper_functions/                  # Utility functions
└── models/                           # Pre-trained segmentation models

Usage Examples

Running Individual Pipeline Steps

Start with ND2 conversion:

uv run jupyter lab 01_ND2_to_OME-ZARR.ipynb

Perform segmentation: For cell colony segmentation:
```
uv run jupyter lab 02_Colony_Segmentation.ipynb
```

For nucleus segmentation:

uv run jupyter lab 03_Nucleus_Segmentation.ipynb

Track cells:

# For single FOV
python 04_Cell_Tracking_ultrack.py

# For all FOVs
python 04_Cell_Tracking_ultrack_all_fovs.py

# Or submit SLURM job
sbatch 04_Cell_Tracking_slurm.sh

Extract features:

jupyter lab 05_Feature_Extraction_ERK_Oct4.ipynb

Viewing Results

cd data_viewer
uv run data_viewer.py /path/to/your/processed/data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CellZarr: High-Content Image Analysis Pipeline for Live Imaging Timelapse Experiments

Pipeline Overview

Key Features

Installation and Setup

Prerequisites

Quick Setup

Selective Installation

For StarDist nucleus segmentation only:

For Cellpose nucleus segmentation only:

For Cellpose nucleus segmentation on GPU only:

For trackastra segmentation on CPU only:

For Cellpose nucleus segmentation and trackastra tracking with Cuda (GPU) support:

For ConvPaint colony segmentation only:

Base installation (without segmentation models):

Multiple extras can be installed together:

Configuration

Pipeline Components

1. ND2 to OME-Zarr Conversion (`01_ND2_to_OME-ZARR.ipynb`)

2. Colony Segmentation (`02_Colony_Segmentation.ipynb`)

3. Nucleus Segmentation (`03_Nucleus_Segmentation.ipynb`)

4. Cell Tracking

Python Wrapper (`04_Cell_Tracking_ultrack.py` and `04_Cell_Tracking_ultrack_all_fovs.py`)

SLURM Batch Processing (`04_Cell_Tracking_slurm.sh`)

5. Feature Extraction (`05_Feature_Extraction_ERK_Oct4.ipynb`)

Data Visualization

Running the Data Viewer

Parallel Processing with Dask

File Structure

Usage Examples

Running Individual Pipeline Steps

Viewing Results

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
configuration		configuration
data_viewer		data_viewer
helper_functions		helper_functions
models		models
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
01_ND2_to_OME-ZARR.ipynb		01_ND2_to_OME-ZARR.ipynb
02_Colony_Segmentation.ipynb		02_Colony_Segmentation.ipynb
03_Nucleus_Segmentation.ipynb		03_Nucleus_Segmentation.ipynb
03_Nucleus_Segmentation_cellpose.ipynb		03_Nucleus_Segmentation_cellpose.ipynb
04_Cell_Tracking_slurm.sh		04_Cell_Tracking_slurm.sh
04_Cell_Tracking_trackastra.ipynb		04_Cell_Tracking_trackastra.ipynb
04_Cell_Tracking_ultrack.py		04_Cell_Tracking_ultrack.py
04_Cell_Tracking_ultrack_all_fovs.py		04_Cell_Tracking_ultrack_all_fovs.py
05_Feature_Extraction_ERK_Oct4.ipynb		05_Feature_Extraction_ERK_Oct4.ipynb
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

pertzlab/CellZarr

Folders and files

Latest commit

History

Repository files navigation

CellZarr: High-Content Image Analysis Pipeline for Live Imaging Timelapse Experiments

Pipeline Overview

Key Features

Installation and Setup

Prerequisites

Quick Setup

Selective Installation

For StarDist nucleus segmentation only:

For Cellpose nucleus segmentation only:

For Cellpose nucleus segmentation on GPU only:

For trackastra segmentation on CPU only:

For Cellpose nucleus segmentation and trackastra tracking with Cuda (GPU) support:

For ConvPaint colony segmentation only:

Base installation (without segmentation models):

Multiple extras can be installed together:

Configuration

Pipeline Components

1. ND2 to OME-Zarr Conversion (01_ND2_to_OME-ZARR.ipynb)

2. Colony Segmentation (02_Colony_Segmentation.ipynb)

3. Nucleus Segmentation (03_Nucleus_Segmentation.ipynb)

4. Cell Tracking

Python Wrapper (04_Cell_Tracking_ultrack.py and 04_Cell_Tracking_ultrack_all_fovs.py)

SLURM Batch Processing (04_Cell_Tracking_slurm.sh)

5. Feature Extraction (05_Feature_Extraction_ERK_Oct4.ipynb)

Data Visualization

Running the Data Viewer

Parallel Processing with Dask

File Structure

Usage Examples

Running Individual Pipeline Steps

Viewing Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages

1. ND2 to OME-Zarr Conversion (`01_ND2_to_OME-ZARR.ipynb`)

2. Colony Segmentation (`02_Colony_Segmentation.ipynb`)

3. Nucleus Segmentation (`03_Nucleus_Segmentation.ipynb`)

Python Wrapper (`04_Cell_Tracking_ultrack.py` and `04_Cell_Tracking_ultrack_all_fovs.py`)

SLURM Batch Processing (`04_Cell_Tracking_slurm.sh`)

5. Feature Extraction (`05_Feature_Extraction_ERK_Oct4.ipynb`)