Skip to content

artyomovlab/dualsimplex_paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Non-negative matrix factorization and deconvolution as dual simplex problem

This repository is an official starting point to explore Dual Simplex NMF/deconvolution method It contains code to reproduce figures from the paper, and at the same time, provides examples on how to use the DualSimplex package.

Non-negative matrix factorization and deconvolution as dual simplex problem
Denis Kleverov, Ekaterina Aladyeva, Alexey Serdyukov, Maxim Artyomov
bioRxiv 2024.04.09.588652; doi: https://doi.org/10.1101/2024.04.09.588652

Project structure

- data — all the external data, used in figures
- figures — notebooks for figures reproduction
- out — generated svgs and dualsimplex checkpoints will be placed here
- R — supporting code, imported in figures

Running

  1. Select a figure to reproduce.
  2. Script setup.R (executed at the beginnig of the each script) will install the DualSimplex package using the github
  3. Some figures require additional data, copy large.tar.gz into data/large.
  4. Go to the figures directory and open the corresponding notebook.
  5. Run cells in the notebook one by one. Optionally, tweak some parameters to see alternative outcomes.
  6. See resulting figures in the out directory.

Figures in this repository

2. Sinkhorn procedure

Simple visualization of the Sinkhorn procedure applied to factorizable matrix (2_sinkhorn_visualization.Rmd)

3. Main algorithm

Deconvolution of simulated bulk RNA-seq gene expression dataset with main approach (3c_simulated_gene_expression_main_algorithm.Rmd)

4. Picture unmixing with NMF

5. Complete deconvolution of bulk RNA-seq data

6. DREAM challenge data analysis with Dual Simplex

7. Special cases when our method is good

S3. NMF with simulated data matrices

S4. Further analysis for TCGA HNSC bulk RNA-seq dataset

Pathway analysis, signature genes expression heatmap, multple initializations (s4_hnsc_further.Rmd)

S5. Signature base deconvolution with DualSimpelx approach

S7. Single cell analysis (clustering)

S8. Multiple solution NMF. How our method behaves

Supplementary notes scripts

Authors

Contributors names and contact info

Troubleshooting

Dependency: package 'xxx' is not available (for R version x.y.z)

Install package directly from source link from CRAN. For example:

install.packages(https://cran.r-project.org/src/contrib/RcppML_0.3.7.tar.gz, repos = NULL)

Can't plot UMAP with plot_projected on Mac

Unfortunately, umap library has a bug (only on MacOS) that doesn't allow to add new points to umap after it's calculated, which is crucial for DualSimplex. If that is the case for you, call plot_projected(use_dims = 2:3), or other dimensions, to see simplexes without dimensionality reduction.

About

All necessary scripts to reproduce the Dual Simplex paper

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages