SAETWAS: A Structure-Aware Ensemble Test for Unified Multi-Tissue and Multi-Trait Transcriptome-Wide Association Studies

Introduction

SAETWAS (Structure-Aware Ensemble Test for Transcriptome-Wide Association Studies) is an R package that implements a novel statistical framework for integrating multi-tissue and multi-trait evidence in transcriptome-wide association studies (TWAS). Traditional TWAS methods often analyze single tissues and single traits in isolation, failing to capture complex shared genetic architectures and pleiotropic effects across multiple tissues and phenotypes.

SAETWAS addresses this gap by:

Jointly analyzing multi-tissue eQTL summary statistics and multi-trait GWAS summary statistics.
Employing a structure-aware ensemble learning strategy to effectively detect sparse and structured signals within high-dimensional matrices.
Relying exclusively on summary-level statistics, enhancing applicability while bypassing individual-level data privacy constraints.

This package provides a robust and powerful tool for large-scale discovery in complex trait genetics.

Installation

You can install the SAETWAS R package directly from GitHub using the devtools package.

First, ensure you have devtools and other necessary R packages installed:

install.packages(c("devtools", "usethis", "roxygen2", "Rcpp", "RcppArmadillo", "dplyr", "arrow", "Matrix", "MASS", "Rfast"))

Next, install SAETWAS from GitHub:

# Using your GitHub account: amss-stat
devtools::install_github("amss-stat/SAETWAS")

Quick Start Example (Gene ID 700)

This example demonstrates how to use the run_saet_twas_for_gene function to perform SAET-TWAS analysis for a specific gene (Gene ID 700), utilizing example data included within the package.

Once the package is installed, you can access its functions and run this example.

# 1. Load the SAETWAS package
library(SAETWAS)

# 2. Define parameters for the example gene (ID 700)
#    These are the sample sizes for 10 tissues as used in the paper.
example_gene_id <- 700
example_p_tissues <- 10
example_q_traits <- 4
example_N_tissue_samples <- c(803, 818, 754, 714, 691, 684, 472, 362, 295, 262)

# 3. Locate example data files within the installed package
#    'system.file("extdata", ...)' is the standard way to access internal package data.
example_extdata_dir <- system.file("extdata", package = "SAETWAS")

# Specific paths to the example data files
# The '700/' folder is directly under extdata, so base_data_dir is just example_extdata_dir
example_base_data_dir_pkg <- example_extdata_dir 
example_annotation_file_path_pkg <- file.path(example_extdata_dir, "gene_700_annotation.csv")
example_phenotype_corr_path_pkg <- file.path(example_extdata_dir, "phenotype_correlation_matrix.csv")

# Optional: Verify example data existence (good practice for robust examples)
if (!dir.exists(file.path(example_base_data_dir_pkg, as.character(example_gene_id)))) {
  stop("Example data folder for Gene 700 not found in package. Please check package installation.")
}
if (!file.exists(example_annotation_file_path_pkg)) {
  stop("Example annotation file 'gene_700_annotation.csv' not found in package.")
}
if (!file.exists(example_phenotype_corr_path_pkg)) {
  stop("Example phenotype correlation matrix 'phenotype_correlation_matrix.csv' not found in package.")
}

# 4. Run the SAET-TWAS analysis for Gene ID 700
#    Results are saved to a temporary directory, avoiding cluttering user's filesystem.
message("\n--- Running SAET-TWAS example for Gene ID 700... ---")
example_output_temp_dir <- tempdir() 

result_gene_700 <- SAETWAS::run_saet_twas_for_gene(
  gene_id = example_gene_id,
  base_data_dir = example_base_data_dir_pkg,
  output_base_dir = example_output_temp_dir,
  annotation_file_path = example_annotation_file_path_pkg,
  phenotype_corr_path = example_phenotype_corr_path_pkg,
  N_tissue_samples = example_N_tissue_samples,
  p_tissues = example_p_tissues,
  q_traits = example_q_traits,
  random_seed = 12345, # Fixed seed for reproducible example results
  k_svd_ratio = 30,
  boundary_svd_count = 5,
  num_snps_sample_m = 6,
  num_bootstrap_B = 2000,
  use_svd_regularization = TRUE
)

# 5. Print the result
message("\n--- SAET-TWAS Example Result for Gene ID 700 ---")
print(result_gene_700)
# For very small p-values, print in scientific format
if (!is.na(result_gene_700$saet_p_value)) {
  message("Precise P-value:")
  print(format(result_gene_700$saet_p_value, scientific = TRUE, digits = 20))
}

# 6. Clean up temporary output files (important for good practice in examples)
message(sprintf("\n--- Cleaning up temporary example output from %s ---", 
                file.path(example_output_temp_dir, as.character(example_gene_id))))
unlink(file.path(example_output_temp_dir, as.character(example_gene_id)), recursive = TRUE)

Input Data Format Requirements

The SAETWAS package expects input data in the following general format, as demonstrated by the example files in inst/extdata:

Gene Data Folders (base_data_dir): Each gene (e.g., gene_id = 700) should have its own subfolder containing:
- tissue[1-P].parquet: Parquet files for eQTL summary statistics (e.g., slope, slope_se, variant_id).
- gwas[1-Q].parquet: Parquet files for GWAS summary statistics (e.g., beta, se, pos_hg38, variant_id).
- snp012.parquet: Parquet file for LD reference genotypes (individuals x SNPs), with metadata columns (e.g., first 7 columns for SNP info) and genotype data from column 8 onwards.
Annotation File (annotation_file_path): A CSV file with at least number (gene ID), start, and end columns for gene coordinates.
Phenotype Correlation Matrix (phenotype_corr_path): A CSV file representing the Q x Q trait correlation matrix.

Citation

If you use SAETWAS in your research, please cite our paper:

SAET-TWAS: A Structure-Aware Ensemble Test for Unified Multi-Tissue and Multi-Trait Transcriptome-Wide Association Studies Deliang Bu, Le Song, Han Meng, Nayang Shan, Qizhai Li

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 12401359 to D.B. and 12301374 to N.S.). Q.L. was supported by the National Natural Science Foundation of China (Grant Nos. 12325110 and 12288201).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
R		R
inst/extdata		inst/extdata
man		man
src		src
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
SAETWAS.Rproj		SAETWAS.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

SAETWAS: A Structure-Aware Ensemble Test for Unified Multi-Tissue and Multi-Trait Transcriptome-Wide Association Studies

Introduction

Installation

Quick Start Example (Gene ID 700)

Input Data Format Requirements

Citation

License

Acknowledgements

About

Licenses found

Uh oh!

Releases

Packages

Languages

License

Licenses found

amss-stat/SAETWAS

Folders and files

Latest commit

History

Repository files navigation

SAETWAS: A Structure-Aware Ensemble Test for Unified Multi-Tissue and Multi-Trait Transcriptome-Wide Association Studies

Introduction

Installation

Quick Start Example (Gene ID 700)

Input Data Format Requirements

Citation

License

Acknowledgements

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages