Skip to content

CellZarr is a reproducible workflow for high-content analysis of live-cell timelapse imaging, processing raw ND2 files to quantitative features for scalable biological analysis.

Notifications You must be signed in to change notification settings

pertzlab/CellZarr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CellZarr: High-Content Image Analysis Pipeline for Live Imaging Timelapse Experiments

CellZarr is a comprehensive and reproducible workflow for high-content image analysis of live cell imaging timelapse experiments. The pipeline processes microscopy data from raw ND2 files to quantitative feature extraction, enabling downstream biological analysis using modern, scalable tools and formats.

Pipeline Overview

The workflow is designed to be highly modular and consists of the following main steps:

  1. ND2 to OME-Zarr conversion: Convert raw ND2 microscopy files to the OME-Zarr format for scalable, cloud-ready storage and analysis.
  2. Colony segmentation using ConvPaint: Identify and segment stem cell colonies in the images using a deep learning-based approach.
  3. Nucleus segmentation using StarDist / Cellpose: Detect and segment individual nuclei within colonies for single-cell analysis.
  4. Cell Tracking: Track individual cells over time to study dynamic behaviors using ultrack or trackastra.
  5. Feature Extraction: Quantify spatial features and extract relevant biological markers (e.g., ERK, Oct4) for each cell.

The workflow is highly modular, making it straightforward to adapt to different datasets or analysis needs. Once the ND2 files have been converted to OME-Zarr, the subsequent steps can be performed independently, allowing you to skip or repeat steps as required for your analysis.

Key Features

  • Scalable Processing: Extensive use of Dask for parallel and distributed computing
  • Modern Data Format: OME-Zarr for efficient storage and cloud compatibility
  • Interactive Visualization: Custom Napari-based data viewer for exploring results
  • Configurable: Easy configuration through settings files

Installation and Setup

Prerequisites

  • uv package manager

Note: Thanks to using uv as a package manager, you don't need to install Python or Conda separately. uv automatically manages Python installations and virtual environments for you.

Quick Setup

  1. Clone the repository:

    git clone https://github.com/pertzlab/CellZarr
    cd CellZarr
  2. Install all recommanded dependencies:

    uv sync --extra aio_gpu

    if you have a GPU with cuda 12.8. or

    uv sync --extra aio_cpu

    if you are on a CPU system.

    This command automatically creates a virtual environment (.venv) in your project directory and installs cellpose for segmentation, as well as trackastra for tracking. This virtual environment can be used directly in VS Code for running the Jupyter notebooks.

Selective Installation

If you don't need all pipeline components, you can install only specific extras:

For StarDist nucleus segmentation only:

uv sync --extra stardist_seg

For Cellpose nucleus segmentation only:

uv sync --extra cellpose_seg_cpu

For Cellpose nucleus segmentation on GPU only:

uv sync --extra cellpose_seg_cuda128

For trackastra segmentation on CPU only:

uv sync --extra trackastra_cpu

For Cellpose nucleus segmentation and trackastra tracking with Cuda (GPU) support:

uv sync --extra cellpose_seg_cuda128 --extra trackastra_cuda128

For ConvPaint colony segmentation only:

uv sync --extra colony_seg
uv pip install pyqt5  # Required on Windows

Base installation (without segmentation models):

uv sync

Multiple extras can be installed together:

uv sync --extra stardist_seg --extra colony_seg

If you are using ultrack for tracking, you do not need to use the uv sync commands (except for the base pipeline), as it will be executed using uv run that creates a temporary virtual environment.

Virtual Environment Integration: All uv sync commands automatically create and manage a virtual environment (.venv) in your project directory. VS Code will automatically detect this environment and can use it for running Jupyter notebooks and Python scripts.

Configuration

Configuration options such as output paths and scaling parameters can be easily adjusted in:

  • configuration/settings.py - General pipeline settings
  • configuration/dask.py - Dask cluster configuration

Pipeline Components

1. ND2 to OME-Zarr Conversion (01_ND2_to_OME-ZARR.ipynb)

The first step converts raw ND2 microscopy files into the OME-Zarr format. ND2 is a proprietary file format commonly used for storing high-content microscopy data. OME-Zarr is an open, scalable, and cloud-compatible format that enables efficient storage, access, and analysis of large multidimensional image datasets.

In this step, the ND2 file is loaded, relevant metadata is extracted, and each field of view (FOV) is saved as a separate OME-Zarr dataset.

2. Colony Segmentation (02_Colony_Segmentation.ipynb)

Uses ConvPaint, a deep learning-based approach, to identify and segment stem cell colonies in the images. This step requires the colony_seg extra dependencies.

3. Nucleus Segmentation (03_Nucleus_Segmentation.ipynb)

Employs StarDist to detect and segment individual nuclei within colonies for single-cell analysis. This enables downstream single-cell feature extraction and tracking.

4. Cell Tracking

Cell tracking is implemented using ultrack and can be executed in two ways:

Python Wrapper (04_Cell_Tracking_ultrack.py and 04_Cell_Tracking_ultrack_all_fovs.py)

  • Direct execution through Python scripts
  • Suitable for local processing or smaller datasets

SLURM Batch Processing (04_Cell_Tracking_slurm.sh)

  • For high-performance computing environments
  • Recommended for large datasets with many FOVs
  • Requires SLURM workload manager

Gurobi License Recommendation: For optimal performance, we recommend obtaining a Gurobi license:

5. Feature Extraction (05_Feature_Extraction_ERK_Oct4.ipynb)

Quantifies spatial features and extracts relevant biological markers (e.g., ERK, Oct4) for each cell, enabling downstream biological analysis.

Data Visualization

The data_viewer/ folder contains a specialized Napari-based viewer for visualizing the generated OME-Zarr data. This viewer supports:

  • Multi-channel OME-Zarr visualization
  • Custom grayscale metadata tags for enhanced display
  • Label and tracking data overlay
  • Interactive navigation between experiments and FOVs

Important Note on Custom OME-Zarr Labels: The pipeline generates OME-Zarr files with some labels stored as grayscale images (e.g., biosensor expression overlays) rather than standard categorical labels. While these files can be loaded in Python without issues, the standard OME-Zarr Napari plugin may not properly display these custom grayscale labels (these labels have an additional metadata tag called greyscale set to Tr). Therefore, we strongly recommend using the integrated data viewer provided in this repository, which is specifically designed to handle these custom label types correctly.

Running the Data Viewer

cd data_viewer
uv run data_viewer.py [path_to_data]

See data_viewer/README.MD for detailed usage instructions.

Parallel Processing with Dask

The pipeline extensively uses Dask to parallelize operations for maximum performance:

  • Distributed Computing: Automatic scaling across available CPU cores
  • Memory Management: Efficient handling of large datasets
  • Progress Monitoring: Real-time progress tracking for long-running operations
  • Configurable: Cluster settings can be adjusted in configuration/dask.py

File Structure

CellZarr/
├── 01_ND2_to_OME-ZARR.ipynb          # ND2 to OME-Zarr conversion
├── 02_Colony_Segmentation.ipynb       # Colony segmentation with ConvPaint
├── 03_Nucleus_Segmentation.ipynb      # Nucleus segmentation with StarDist
├── 04_Cell_Tracking_ultrack.py        # Cell tracking (single FOV)
├── 04_Cell_Tracking_ultrack_all_fovs.py # Cell tracking (all FOVs)
├── 04_Cell_Tracking_slurm.sh          # SLURM batch job for tracking
├── 05_Feature_Extraction_ERK_Oct4.ipynb # Feature extraction
├── pyproject.toml                      # Project dependencies
├── uv.lock                            # Dependency lock file
├── configuration/                      # Configuration files
│   ├── settings.py                    # General settings
│   └── dask.py                        # Dask cluster configuration
├── data_viewer/                       # Napari-based data viewer
│   ├── data_viewer.py                 # Main viewer application
│   └── README.MD                      # Viewer documentation
├── helper_functions/                  # Utility functions
└── models/                           # Pre-trained segmentation models

Usage Examples

Running Individual Pipeline Steps

  1. Start with ND2 conversion:

    uv run jupyter lab 01_ND2_to_OME-ZARR.ipynb
  2. Perform segmentation: For cell colony segmentation:

    uv run jupyter lab 02_Colony_Segmentation.ipynb

For nucleus segmentation:

uv run jupyter lab 03_Nucleus_Segmentation.ipynb
  1. Track cells:

    # For single FOV
    python 04_Cell_Tracking_ultrack.py
    
    # For all FOVs
    python 04_Cell_Tracking_ultrack_all_fovs.py
    
    # Or submit SLURM job
    sbatch 04_Cell_Tracking_slurm.sh
  2. Extract features:

    jupyter lab 05_Feature_Extraction_ERK_Oct4.ipynb

Viewing Results

cd data_viewer
uv run data_viewer.py /path/to/your/processed/data

About

CellZarr is a reproducible workflow for high-content analysis of live-cell timelapse imaging, processing raw ND2 files to quantitative features for scalable biological analysis.

Topics

Resources

Stars

Watchers

Forks