Modified GPU-accelerated currentNe(https://github.com/esrud/currentNe) with PED/MAP, and VCF input support, plus complete Ne estimation & confidence intervals.
GPU-accelerated fork of currentNe adding PED/MAP and VCF input, and providing end-to-end Nₑ estimation with confidence intervals. The GPU path computes weighted LD (d²) in FP64 using atomicAdd(double*), while Nₑ and CIs follow the original integration and neural-network variance model.
Requires: NVIDIA GPU ≥ Pascal (SM ≥ 6.0), NVIDIA driver + CUDA Toolkit (12+), gcc/g++ & make, and ~1 GB free GPU memory (more for large datasets).
OpenCL version is available Under Release currentNe-ocl.zip can be used on Nvidia, AMD and Intel GPUs. The CUDA and OpenCL implementations produce results that are fully consistent with the original CPU version.
An Apple Metal FP32 version is also available for testing purposes. Because Metal does not support FP64, the FP32 estimated d² and Ne values may differ from those of the CPU version. The FP32 version is provided only as a test of Metal GPU computing. If needed, please contact the CurrentNe_gpu author: hrluo93@foxmail.com .
Cooling note: Not recommended to run on passively cooled (fanless) Tesla GPUs without server-grade, front-to-back airflow. The FP64 path saturates the FP units for extended periods, creating stress-test-level thermal load (stress FPU). Inadequate airflow will cause throttling or faults.
CurrentNe original Authors: Enrique Santiago, Carlos Köpke
Citations:
Santiago, E., Caballero, A., Köpke, C., & Novo, I. (2024). Estimation of the contemporary effective population size from SNP data while accounting for mating structure. Molecular Ecology Resources, 24, e13890. https://doi.org/10.1111/1755-0998.13890
Santiago, E., Köpke, C. & Caballero, A. Accounting for population structure and data quality in demographic inference with linkage disequilibrium methods. Nat Commun 16, 6054 (2025). https://doi.org/10.1038/s41467-025-61378-w##
unzip currentNe_gpu_full.zip
cd currentNe_gpu_full
make ARCH=sm_89 # choose your GPU's SM arch (sm_70, sm_80, sm_86, sm_89 ...) also should be set `ARCH ?=sm_89` in Makefile accordingly.This creates ./currentNe_gpu.
make cpuThis creates ./currentNe_gpu_cpu (OpenMP).
unzip currentNe-ocl.zip
cd currentNe-ocl
make -f Makefile.openclThis creates ./currentNe_ocl.
General form:
ulimit -s unlimited #default Maxloci setting to 20 million, can increase in the cpp file.
./currentNe_gpu <datafile> <num_chromosomes> [options]<datafile>: one ofprefix.vcfprefix.ped(requiresprefix.mapin the same folder)prefix.tped(with individuals as columns following the first 4 fields)
<num_chromosomes>: required (e.g.,22for human-like autosomes, or the true count for your organism's autosomes).
Common options:
-s <N>Number of SNPs to use (default: all segregating)-t <T>CPU threads (for non-GPU parts; default: OpenMP auto)-o <file>Output filename (default:<prefix>_currentNe_OUTPUT.txt)-k <int>Important, please see original description in currentNe-qQuiet: only print Ne (and with-valso 50% & 90% CI)-vWith-q, also print CIs-pPrint full analysis to stdout instead of file
Examples:
# TPED
./currentNe_gpu mydata.tped 19 -t 8
# PED/MAP
./currentNe_gpu mypop.ped 19 -t 8
# VCF
./currentNe_gpu cohort.vcf 19 -t 8
./currentNe_gpu cohort.vcf 19 -t 8 -k 1 -t 8 is enough
- Full report file (unless
-p):<prefix>_currentNe_OUTPUT.txt
Includes: input stats, d², expected/observed het, Ne point estimate, 50%/90% CI.
- Double
atomicAddrequires GPU architecture sm_60+; setARCHaccordingly. - For very large SNP counts, memory =
L × Nbytes (char). Consider filtering-sor thinning SNPs.