-
Notifications
You must be signed in to change notification settings - Fork 1
4. eQTL Mapping
FastGxC estimates genetic effects on the context-shared and each context-specific component using ultra-fast linear regression models.
FastGxC uses the MatrixEQTL R package to perform cis- (or trans-) eQTL mapping using the eQTL_mapping_step() function.
eQTL_mapping_step()
This function takes input genotypes, expression, and gene/SNP location files (see simulate data and decomposition sections for format of input files), and produces summary statistics in MatrixEQTL format. Below is an example code demonstration how to run eQTL mapping on the shared and C-specific expression components from the simulated data.
input_dir <- "simulated_example/"
out_dir <- input_dir
expr_files <- list.files(input_dir, pattern = "_specific_expression.txt$")
### NOTE: this is just an example of how to set context names with simulated data.
### Please change these lines to be specific to your dataset
nC <- length(expr_files)
context_names <- paste0("context", seq(1, nC))
all_contexts <- c(context_names, "shared")
SNP_file_name <- file.path(input_dir, "SNPs.txt")
snps_location_file_name <- file.path(input_dir, "snpsloc.txt")
gene_location_file_name <- file.path(input_dir, "geneloc.txt")
for (i in seq_along(all_contexts)) {
context <- all_contexts[i]
if (context == "shared") {
shared_specific <- "shared"
expression_file_name <- file.path(input_dir, "context_shared_expression.txt")
} else {
shared_specific <- "specific"
expression_file_name <- file.path(input_dir, paste0(context, "_specific_expression.txt"))
}
eQTL_mapping_step(
SNP_file_name = SNP_file_name,
snps_location_file_name = snps_location_file_name,
expression_file_name = expression_file_name,
gene_location_file_name = gene_location_file_name,
context = context,
out_dir = out_dir,
output_file_name_cis = file.path(out_dir, paste0(context, "_", shared_specific, ".cis_pairs.txt")),
output_file_name_tra = file.path(out_dir, paste0(context, "_", shared_specific, ".trans_pairs.txt")),
use_model = modelLINEAR,
cis_dist = 1e6,
pv_threshold_cis = 1,
pv_threshold_tra = 0,
error_covariance = numeric()
)
}
FastGxC also supports GPU-accelerated cis-eQTL mapping using tensorqtl, integrated via the reticulate R package. To enable this, simply pass method = "tensorqtl" to the eQTL_mapping_step() function. The inputs remain unchanged. This method is particularly useful for large datasets and provides substantial speedups. Note: TensorQTL requires Python 3.7+ and appropriate libraries (tensorqtl, numpy, pandas, pyarrow, torch). Ensure your environment is properly configured.
FastGxC will save C+1 summary statistics files in .txt format (see the output file example below) for compatibility with downstream FastGxC multiple testing adjustment steps.
contextX_specific.cis_pairs.txt
SNP gene beta p.value FDR
snp64928 gene65 0.418661087751389 8.4807042218939e-07 0.084807042218939
snp42566 gene43 -0.355810463428497 7.53491371859939e-06 0.306773379105091
snp22672 gene23 -0.394770801067352 9.20320137315272e-06 0.306773379105091
snp73350 gene74 0.344478219747543 1.59910672991937e-05 0.399776682479843
snp86064 gene87 -0.371489137411118 3.10237595856035e-05 0.550542627746407
FastGxC decomposes eQTL effects into a shared component and context-specific deviations. By construction, specific effects represent deviations from the mean effect across contexts. Positive values indicate stronger-than-average effects; negative values indicate weaker-than-average effects. The full eQTL effect size in each context
The code below reads the shared and context-specific summary statistics produced by the eQTL mapping step and recovers the total effect size in each context. The merge performs an inner join on SNP-gene pairs, so only pairs present in both files are combined.
input_dir <- "simulated_example/"
# Load shared effects
shared <- read.table(file.path(input_dir, "shared_shared.cis_pairs.txt"),
header = TRUE, stringsAsFactors = FALSE)
# Find all context-specific output files
specific_files <- list.files(input_dir, pattern = "_specific\\.cis_pairs\\.txt$",
full.names = TRUE)
# For each context, merge with shared effects and compute total effect
total_effects <- lapply(specific_files, function(f) {
context <- gsub("_specific\\.cis_pairs\\.txt$", "", basename(f))
specific <- read.table(f, header = TRUE, stringsAsFactors = FALSE)
# Inner join on SNP-gene pair ensures only matching pairs are combined
merged <- merge(shared, specific, by = c("SNP", "gene"), suffixes = c(".sh", ".sp"))
merged$beta.total <- merged$beta.sh + merged$beta.sp
merged$context <- context
merged[, c("SNP", "gene", "context", "beta.sh", "beta.sp", "beta.total")]
})
total_effects <- do.call(rbind, total_effects)
head(total_effects)