Skip to content

Faster outlier handling #4

@perishky

Description

@perishky

Most computation for handling outliers happens in matrixStats::rowQuantiles(). This function is solidly written in R, but could be faster if implemented in C and/or if it made use of parallel processing.

WGCNA has a C implementation but would rather not take on all WGCNA dependencies!

e.g.
iqr.trim<-function(methylation, iqr.limit=3) {
## find outlying observations more extreme that the iqr.limit
quantiles <- matrix(NA, ncol=2, nrow=nrow(methylation))
quantiles[,1] <- WGCNA::rowQuantileC(methylation,0.25)
quantiles[,2] <- rowQuantileC(methylation,0.75)
##quantiles <- matrixStats::rowQuantiles(methylation, probs = c(0.25, 0.75), na.rm = T)
iqr <- quantiles[,2] - quantiles[,1]
low <- quantiles[,1]
upper <- quantiles[,2]

initial.missing <- rowSums(is.na(methylation))

methylation[methylation < low - iqr.limit * iqr] <- NA
outliers.lower <- rowSums(is.na(methylation)) - initial.missing

methylation[methylation > upper + iqr.limit * iqr] <- NA
outliers.upper <- rowSums(is.na(methylation)) - initial.missing - outliers.lower

n <- ncol(methylation) - initial.missing - outliers.upper - outliers.lower
log <- data.frame(outliers.lower, outliers.upper, n) 

return(list(methylation=methylation, log=log)) 

}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions