Skip to content

The reason for pre-defined breaks (-logP) #26

@YaxinL

Description

@YaxinL

Regarding the log-transformed P-value breaks used to generate the joint distribution in conjFDR, the default settings are:
trait 1 breaks from [0,30] in increments of 0.01; seq(0,30,0.01)
trait 2 breaks from [0,3] in increments of 0.1; seq(0,3,0.1)

Is there any reason for this breaks? Would it be beneficial to adjust the breaks for trait 2?

I tried another break: t1 for [0,20] in increments of 0.01 and t2 for [0,10] in increments of 0.1. Using the same dataset, this break can identify more SNPs with lower conjFDR. The distribution of conjFDR also slight changed because dividing the joint P-value space into smaller cells. I assume that using finer breaks can better capture the local peaks and reduce the dilution artifacts.

For example, A SNP with (P1 = 10-5, P2 = 5×10-6) might fall into a bin with many null SNPs in coarse breaks (e.g., P2 = 0–0.01), failing conjFDR. With finer breaks, it could dominate its new bin (P2 = 0–10-5), passing FDR.

However, using finer breaks may also risk increasing variance in FDR estimation (finer bins reduce the number of SNPs per cell) and the empirical FDR adjustment may become unstable or inflated, especially in regions with low SNP density. I am wondering if there is any way to get robust estimate or any other method to complement (e.g. clumping).

Is there any reason for this break?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions