Skip to content

Low probabilities in the cell × spot matrix #58

Description

@LuisHeinzlmeier

Hi @longyahui,

We are currently building an nf-core benchmarking pipeline for single-cell to spatial mapping tools and would like to include GraphST as part of the benchmark.

However, for several datasets, we consistently obtain very low probabilities in the cell × spot matrix, with maximum values of only around 0.5%.

Could you clarify whether such low probabilities are expected, or whether there might be an issue with our setup or input data? Do you have any suggestions on what could cause this behavior?

Thank you for your work!
Best,
Luis

Datasets

>>> adata_sc
AnnData object with n_obs × n_vars = 2688 × 18078
    obs: 'cell_type'
>>> adata_sp
AnnData object with n_obs × n_vars = 410 × 18078
    obsm: 'spatial'

Code

# Calling `GraphST.preprocess(adata_sp)` with the default `seurat_v3` flavor
# fails on Python 3.8 because it requires the `skmisc` package, which cannot
# be installed on Python 3.8. Attempts to install `scikit-misc` fail due to
# Python version incompatibility and build dependencies. The workaround is to
# use a different HVG flavor, such as `"cell_ranger"`, which does not require `skmisc`.

def preprocess(adata):
    sc.pp.normalize_total(adata, target_sum=1e4)
    sc.pp.log1p(adata)
    sc.pp.highly_variable_genes(adata, flavor="cell_ranger", n_top_genes=3000)
    sc.pp.scale(adata, zero_center=False, max_value=10)

# ---------------------------------------- main ------------------------------------

adata_sc = ad.read_h5ad('sc.h5ad')
adata_sp = ad.read_h5ad('sp.h5ad')

# preprocessing for ST data
preprocess(adata_sp)
GraphST.construct_interaction(adata_sp)
GraphST.add_contrastive_label(adata_sp)

# preprocessing for scRNA data
preprocess(adata_sc)

# find overlap genes
adata_sp, adata_sc = filter_with_overlap_gene(adata_sp, adata_sc)

# get features
GraphST.get_feature(adata_sp)

# run device, by default, the package is implemented on 'cpu'. We recommend using GPU.
device = torch.device('cuda:1' if torch.cuda.is_available() else 'cpu')

# train model
model = GraphST.GraphST(adata_sp, adata_sc, device=device, deconvolution=True)
adata_sp, adata_sc = model.train_map()

adata_sp.obsm['map_matrix'].max()

Output:

Number of overlap genes: 2163
Begin to train ST data...
Optimization finished for ST data!
Begin to train scRNA data...
Optimization finished for cell representation learning!
Begin to learn mapping matrix...
Mapping matrix learning finished!
np.float32(0.0056134197)

environment.yml

channels:
  - conda-forge
  - bioconda
dependencies:
  - conda-forge::anndata=0.8.0
  - conda-forge::matplotlib=3.4.2
  - conda-forge::numpy=1.24.4
  - conda-forge::pandas=1.4.2
  - conda-forge::pot=0.9.3
  - conda-forge::python=3.8.15
  - conda-forge::pytorch=1.8.0
  - conda-forge::r-base=4.0.3
  - conda-forge::rpy2=3.4.1
  - conda-forge::scanpy=1.9.1
  - conda-forge::scikit-learn=1.1.1
  - conda-forge::scipy=1.8.1
  - conda-forge::tqdm=4.64.0
  - conda-forge::pyyaml=6.0
  - pip
  - pip:
      - graphst==1.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions