SAETWAS: A Structure-Aware Ensemble Test for Unified Multi-Tissue and Multi-Trait Transcriptome-Wide Association Studies
SAETWAS (Structure-Aware Ensemble Test for Transcriptome-Wide Association Studies) is an R package that implements a novel statistical framework for integrating multi-tissue and multi-trait evidence in transcriptome-wide association studies (TWAS). Traditional TWAS methods often analyze single tissues and single traits in isolation, failing to capture complex shared genetic architectures and pleiotropic effects across multiple tissues and phenotypes.
SAETWAS addresses this gap by:
- Jointly analyzing multi-tissue eQTL summary statistics and multi-trait GWAS summary statistics.
- Employing a structure-aware ensemble learning strategy to effectively detect sparse and structured signals within high-dimensional matrices.
- Relying exclusively on summary-level statistics, enhancing applicability while bypassing individual-level data privacy constraints.
This package provides a robust and powerful tool for large-scale discovery in complex trait genetics.
You can install the SAETWAS R package directly from GitHub using the devtools package.
First, ensure you have devtools and other necessary R packages installed:
install.packages(c("devtools", "usethis", "roxygen2", "Rcpp", "RcppArmadillo", "dplyr", "arrow", "Matrix", "MASS", "Rfast"))Next, install SAETWAS from GitHub:
# Using your GitHub account: amss-stat
devtools::install_github("amss-stat/SAETWAS") This example demonstrates how to use the run_saet_twas_for_gene function to perform SAET-TWAS analysis for a specific gene (Gene ID 700), utilizing example data included within the package.
Once the package is installed, you can access its functions and run this example.
# 1. Load the SAETWAS package
library(SAETWAS)
# 2. Define parameters for the example gene (ID 700)
# These are the sample sizes for 10 tissues as used in the paper.
example_gene_id <- 700
example_p_tissues <- 10
example_q_traits <- 4
example_N_tissue_samples <- c(803, 818, 754, 714, 691, 684, 472, 362, 295, 262)
# 3. Locate example data files within the installed package
# 'system.file("extdata", ...)' is the standard way to access internal package data.
example_extdata_dir <- system.file("extdata", package = "SAETWAS")
# Specific paths to the example data files
# The '700/' folder is directly under extdata, so base_data_dir is just example_extdata_dir
example_base_data_dir_pkg <- example_extdata_dir
example_annotation_file_path_pkg <- file.path(example_extdata_dir, "gene_700_annotation.csv")
example_phenotype_corr_path_pkg <- file.path(example_extdata_dir, "phenotype_correlation_matrix.csv")
# Optional: Verify example data existence (good practice for robust examples)
if (!dir.exists(file.path(example_base_data_dir_pkg, as.character(example_gene_id)))) {
stop("Example data folder for Gene 700 not found in package. Please check package installation.")
}
if (!file.exists(example_annotation_file_path_pkg)) {
stop("Example annotation file 'gene_700_annotation.csv' not found in package.")
}
if (!file.exists(example_phenotype_corr_path_pkg)) {
stop("Example phenotype correlation matrix 'phenotype_correlation_matrix.csv' not found in package.")
}
# 4. Run the SAET-TWAS analysis for Gene ID 700
# Results are saved to a temporary directory, avoiding cluttering user's filesystem.
message("\n--- Running SAET-TWAS example for Gene ID 700... ---")
example_output_temp_dir <- tempdir()
result_gene_700 <- SAETWAS::run_saet_twas_for_gene(
gene_id = example_gene_id,
base_data_dir = example_base_data_dir_pkg,
output_base_dir = example_output_temp_dir,
annotation_file_path = example_annotation_file_path_pkg,
phenotype_corr_path = example_phenotype_corr_path_pkg,
N_tissue_samples = example_N_tissue_samples,
p_tissues = example_p_tissues,
q_traits = example_q_traits,
random_seed = 12345, # Fixed seed for reproducible example results
k_svd_ratio = 30,
boundary_svd_count = 5,
num_snps_sample_m = 6,
num_bootstrap_B = 2000,
use_svd_regularization = TRUE
)
# 5. Print the result
message("\n--- SAET-TWAS Example Result for Gene ID 700 ---")
print(result_gene_700)
# For very small p-values, print in scientific format
if (!is.na(result_gene_700$saet_p_value)) {
message("Precise P-value:")
print(format(result_gene_700$saet_p_value, scientific = TRUE, digits = 20))
}
# 6. Clean up temporary output files (important for good practice in examples)
message(sprintf("\n--- Cleaning up temporary example output from %s ---",
file.path(example_output_temp_dir, as.character(example_gene_id))))
unlink(file.path(example_output_temp_dir, as.character(example_gene_id)), recursive = TRUE)The SAETWAS package expects input data in the following general format, as demonstrated by the example files in inst/extdata:
- Gene Data Folders (
base_data_dir): Each gene (e.g.,gene_id = 700) should have its own subfolder containing:tissue[1-P].parquet: Parquet files for eQTL summary statistics (e.g.,slope,slope_se,variant_id).gwas[1-Q].parquet: Parquet files for GWAS summary statistics (e.g.,beta,se,pos_hg38,variant_id).snp012.parquet: Parquet file for LD reference genotypes (individuals x SNPs), with metadata columns (e.g., first 7 columns for SNP info) and genotype data from column 8 onwards.
- Annotation File (
annotation_file_path): A CSV file with at leastnumber(gene ID),start, andendcolumns for gene coordinates. - Phenotype Correlation Matrix (
phenotype_corr_path): A CSV file representing theQ x Qtrait correlation matrix.
If you use SAETWAS in your research, please cite our paper:
SAET-TWAS: A Structure-Aware Ensemble Test for Unified Multi-Tissue and Multi-Trait Transcriptome-Wide Association Studies Deliang Bu, Le Song, Han Meng, Nayang Shan, Qizhai Li
This project is licensed under the MIT License - see the LICENSE file for details.
This work was supported by the National Natural Science Foundation of China (Grant Nos. 12401359 to D.B. and 12301374 to N.S.). Q.L. was supported by the National Natural Science Foundation of China (Grant Nos. 12325110 and 12288201).