Skip to content

5. Multiple Testing Correction

Lena Krockenberger edited this page Feb 23, 2026 · 27 revisions

FastGxC accounts for multiple hypothesis testing across genes, SNPs, and biological contexts using a hierarchical false discovery rate (FDR) controlling procedure. This step ensures proper correction at multiple levels (e.g., eGenes).

FDR Correction with TreeQTL

FastGxC leverages the TreeQTL R package to perform hierarchical multiple testing correction. The treeQTL_step() function identifies both context-specific and shared eGenes and eAssociations by controlling FDR across a structured hypothesis space.

Note: TreeQTL requires input files formatted according to MatrixEQTL output conventions with p-values in increasing order (from smallest to largest). If you used alternative software like FastQTL, ensure the outputs are formatted correctly.

You can optionally use other multiple testing frameworks, such as mashR, which may yield higher power under some assumptions.

Running TreeQTL

# (note that this function expects files to be named in the same way as the 
# output files from FastGxC's eQTL mapping function)

data_dir <- "simulated_example/"
out_dir <- data_dir
snps_location_file_name <- file.path(data_dir, "snpsloc.txt")
gene_location_file_name <- file.path(data_dir, "geneloc.txt")
fdr_thresh <- 0.05
cisDist <- 1e6

Option 1: Perform separate multiple testing adjustments for shared (two-level hierarchy) and specific (three-level hierarchy) effects. This is the strategy used in real data analysis in the manuscript. This will return files with shared and specific eGenes and eAssociations.

Screenshot 2025-07-23 at 11 17 13 AM
treeQTL_step(
  data_dir,                   # Path to input data (e.g., eQTL results)
  snps_location_file_name,    # SNP location file
  gene_location_file_name,    # Gene location file
  context_names,              # Vector of context/condition names
  out_dir,                    # Path where output will be saved
  cisDist = cisDist,          # cis window
  fdr_thresh = fdr_thresh,    # Desired FDR threshold (e.g., 0.05)
  four_level = F,             # Perform separate multiple testing adjustments for shared and specific effects
  qtl_type = "cis"            # cis or trans eQTLs
)

Option 2: Perform multiple testing adjustments across shared and specific effects using the four-level hierarchy shown below. Note that this strategy uses TreeBH in its implementation and is computationally intensive and should only be used in settings with a small number of tests performed.

Screenshot 2025-07-23 at 11 48 47 AM
treeQTL_step(
  data_dir,                   # Path to input data (e.g., eQTL results)
  snps_location_file_name,    # SNP location file
  gene_location_file_name,    # Gene location file
  context_names,              # Vector of context/condition names
  out_dir,                    # Path where output will be saved
  cisDist = cisDist,          # cis window
  fdr_thresh = fdr_thresh,    # Desired FDR threshold (e.g., 0.05)
  four_level = T,             # Perform a four level hierarchy and adjust for shared and specific together
  qtl_type = "cis"            # cis or trans eQTLs
)

Option 1 Output

treeQTL outputs (1) An eAssociation file per context with context specific SNP-gene pairs. (2) A shared eGenes file with each shared eGene and its corresponding p-value. (3) A specific eGene file with a 1 corresponding to the gene being a significant eGene in that particular context. A specific eGene p-value file reporting all p-value combinations for each significant eGene across all contexts.

(1) eAssoc_by_gene.contextX_specific_cis.txt

gene     SNP         beta                t.stat                 p.value
gene1   snp1    0.368806164512139    5.22116640681629    3.34162013627681e-07
gene2   snp2    0.423778936498137    5.67707930801708    3.25067273366269e-08
gene3   snp3    0.437418111526831    4.98770927957996    1.03954367213838e-06
gene4   snp4    0.462203944759481    5.36767905554521    1.60616150168334e-07
gene5   snp5    0.5495476161726      6.75612550096954    7.44084554530075e-11

(2) shared_eGenes.txt

family         fam_p           n_sel
gene1   1.40293481281949e-09    1
gene2   1.14061108736042e-14    1
gene3   3.35665396944856e-12    1
gene4   3.03128672087492e-13    1
gene5   3.55458571779909e-11    2

(3) specific_eGenes.txt

gene    context1    context2   context3   context4   context5        
gene1     0            0           0          0          0       
gene2     1            0           0          0          0      
gene3     0            0           0          0          0 
gene4     0            0           0          0          0 
gene5     0            0           0          0          0  

(3) specific_eGenes_pvalues.txt

gene         context1                context2                context3                context4                context5        
gene1     0.623725805802154       0.62987552431814        0.888515702826435       0.814045965902757       0.442571993796187      
gene2     5.25935519127471e-05    0.571468511077802       0.33413356471655        0.806318411680398       0.974004596084888
gene3     0.634499598043283       0.725117041047635       0.148461103453072       0.6925215425366         0.451408949812389
gene4     0.59324391608897        0.44380106630737        0.378896498929132       0.466275599226333       0.366614669959797    
gene5     0.0929487562865985      0.756063879067856       0.54373880161688        0.155752178356738       0.269057783925901

Option 2 Output

treeBH outputs a file of gene, SNP, context triplets denoting the significant pairs identified after multiple testing correction.

treeBH_output.txt

gene_name     snp_name    context_name
gene1           snp1       context1
gene2           snp2       context2
gene3           snp3       context2
gene4           snp4       context3
gene5           snp5       context4

Clone this wiki locally