-
Notifications
You must be signed in to change notification settings - Fork 1
5. Multiple Testing Correction
FastGxC accounts for multiple hypothesis testing across genes, SNPs, and biological contexts using a hierarchical false discovery rate (FDR) controlling procedure. This step ensures proper correction at multiple levels (e.g., eGenes).
FastGxC leverages the TreeQTL R package to perform hierarchical multiple testing correction. The treeQTL_step() function identifies both context-specific and shared eGenes and eAssociations by controlling FDR across a structured hypothesis space.
Note: TreeQTL requires input files formatted according to MatrixEQTL output conventions with p-values in increasing order (from smallest to largest). If you used alternative software like FastQTL, ensure the outputs are formatted correctly.
You can optionally use other multiple testing frameworks, such as mashR, which may yield higher power under some assumptions.
# (note that this function expects files to be named in the same way as the
# output files from FastGxC's eQTL mapping function)
data_dir <- "simulated_example/"
out_dir <- data_dir
snps_location_file_name <- file.path(data_dir, "snpsloc.txt")
gene_location_file_name <- file.path(data_dir, "geneloc.txt")
fdr_thresh <- 0.05
cisDist <- 1e6
Option 1: Perform separate multiple testing adjustments for shared (two-level hierarchy) and specific (three-level hierarchy) effects. This is the strategy used in real data analysis in the manuscript. This will return files with shared and specific eGenes and eAssociations.
treeQTL_step(
data_dir, # Path to input data (e.g., eQTL results)
snps_location_file_name, # SNP location file
gene_location_file_name, # Gene location file
context_names, # Vector of context/condition names
out_dir, # Path where output will be saved
cisDist = cisDist, # cis window
fdr_thresh = fdr_thresh, # Desired FDR threshold (e.g., 0.05)
four_level = F, # Perform separate multiple testing adjustments for shared and specific effects
qtl_type = "cis" # cis or trans eQTLs
)
Option 2: Perform multiple testing adjustments across shared and specific effects using the four-level hierarchy shown below. Note that this strategy uses TreeBH in its implementation and is computationally intensive and should only be used in settings with a small number of tests performed.
treeQTL_step(
data_dir, # Path to input data (e.g., eQTL results)
snps_location_file_name, # SNP location file
gene_location_file_name, # Gene location file
context_names, # Vector of context/condition names
out_dir, # Path where output will be saved
cisDist = cisDist, # cis window
fdr_thresh = fdr_thresh, # Desired FDR threshold (e.g., 0.05)
four_level = T, # Perform a four level hierarchy and adjust for shared and specific together
qtl_type = "cis" # cis or trans eQTLs
)
treeQTL outputs (1) An eAssociation file per context with context specific SNP-gene pairs. (2) A shared eGenes file with each shared eGene and its corresponding p-value. (3) A specific eGene file with a 1 corresponding to the gene being a significant eGene in that particular context. A specific eGene p-value file reporting all p-value combinations for each significant eGene across all contexts.
(1) eAssoc_by_gene.contextX_specific_cis.txt
gene SNP beta t.stat p.value
gene1 snp1 0.368806164512139 5.22116640681629 3.34162013627681e-07
gene2 snp2 0.423778936498137 5.67707930801708 3.25067273366269e-08
gene3 snp3 0.437418111526831 4.98770927957996 1.03954367213838e-06
gene4 snp4 0.462203944759481 5.36767905554521 1.60616150168334e-07
gene5 snp5 0.5495476161726 6.75612550096954 7.44084554530075e-11
(2) shared_eGenes.txt
family fam_p n_sel
gene1 1.40293481281949e-09 1
gene2 1.14061108736042e-14 1
gene3 3.35665396944856e-12 1
gene4 3.03128672087492e-13 1
gene5 3.55458571779909e-11 2
(3) specific_eGenes.txt
gene context1 context2 context3 context4 context5
gene1 0 0 0 0 0
gene2 1 0 0 0 0
gene3 0 0 0 0 0
gene4 0 0 0 0 0
gene5 0 0 0 0 0
(3) specific_eGenes_pvalues.txt
gene context1 context2 context3 context4 context5
gene1 0.623725805802154 0.62987552431814 0.888515702826435 0.814045965902757 0.442571993796187
gene2 5.25935519127471e-05 0.571468511077802 0.33413356471655 0.806318411680398 0.974004596084888
gene3 0.634499598043283 0.725117041047635 0.148461103453072 0.6925215425366 0.451408949812389
gene4 0.59324391608897 0.44380106630737 0.378896498929132 0.466275599226333 0.366614669959797
gene5 0.0929487562865985 0.756063879067856 0.54373880161688 0.155752178356738 0.269057783925901
treeBH outputs a file of gene, SNP, context triplets denoting the significant pairs identified after multiple testing correction.
treeBH_output.txt
gene_name snp_name context_name
gene1 snp1 context1
gene2 snp2 context2
gene3 snp3 context2
gene4 snp4 context3
gene5 snp5 context4