Official Python implementation of NSGCCA, from the following paper:
Nonlinear Sparse Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data.
Rong Wu, Ziqi Chen, Gen Li and Hai Shu.
New York University
[arXiv]
We propose three nonlinear, sparse, generalized CCA methods, HSIC-SGCCA, SA-KGCCA, and TS-KGCCA, for variable selection in multi-view high-dimensional data. These methods extend existing SCCA-HSIC, SA-KCCA, and TS-KCCA from two-view to multi-view settings. While SA-KGCCA and TS-KGCCA yield multi-convex optimization problems solved via block coordinate descent, HSIC-SGCCA introduces a necessary unit-variance constraint previously ignored in SCCA-HSIC, resulting in a nonconvex, non-multiconvex problem. We efficiently address this challenge by integrating the block prox-linear method with the linearized alternating direction method of multipliers. Simulations and TCGA-BRCA data analysis demonstrate that HSIC-SGCCA outperforms competing methods in variable selection.
Requirements: Python 3.10
Clone this repository and install other required packages:
git clone git@github.com:Rows21/NSGCCA
cd NSGCCA
conda env create -f env.yml
- Synthetic Datasets synth_data.py
- TCGA Breast Cancer Database in Realdata from (https://tcga-data.nci.nih.gov/docs/publications)
(Feel free to post suggestions in issues of recommending latest proposed CCA network for comparison. Currently, the baselines folder is to put comparable models.)
- Follow the Tutorial file for training
HSIC-SGCCA,SA-KGCCAandTS-KGCCA.
If you find this repository helpful, please consider citing:
@article{wu2025nonlinear,
title={Nonlinear Sparse Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data},
author={Wu, Rong and Chen, Ziqi and Li, Gen and Shu, Hai},
journal={arXiv preprint arXiv:2502.18756},
year={2025}
}
Figure 2: The simulation performance for Synthetic Datasets.
Data_download_preprocess: TCGA-BRCA preprocessing through R script.
Venn Diagram: The clustering results for TCGA-BRCA.
This repository is built using the timm library.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© 2025 Rong Wu. You are free to share and adapt the material with attribution.
