Provides both CPU and GPU functions for the Stetson J statistic (Stetson 1996).
Requires:
- CUDA-enabled device
- CUDA Toolkit >= 7.5
- A suitable C++ compiler (e.g.
g++)
-
Clone this repository
-
Edit the Makefile accordingly
ARCH-- the Compute Capability number for your device, e.g. 5.2 becomes52REAL_TYPE-- you can switch between double and single precision by editing this Makefile variable to eitherdoubleorfloat.CUDA_VERSION-- make sure you have the CUDA Toolkit installed; this code was developed using version 7.5- The
BLOCK_SIZEis also something to play around with but is optional to change.
-
Run
maketo generate a binary that you can run for testing. This is mainly for my own debugging purposes, but you can inspect the code to get an idea of how to call thestetson_j_cpuandstetson_j_gpufunctions.
usage: ./CudaStet <filename> <{s, l}> <skiprows>
filename : path to datafile with either a list of filenames
or a path to a single file containing 'x y yerr'
on each line
s : single file
l : list of files
skiprows : number of rows to skip in the datafile
Arguments (in order):
real_type *x: values for independent variablereal_type *y: values for dependent variablereal_type *err: values for uncertainty of dependent variableweight_type WEIGHTING: Must be one ofCONSTANT: all pairs of observations are weighted equallyEXP: pairs of observations are exponentially suppressed by their distance inx(usesexp(-|t1 - t2| / median(dt))). See Zhang et. al. 2003 for a real-world application of this weighting scheme. Note however that Zhang et. al. 2003 used the mean function instead of the median.
int N: number of datapoints
- Be careful with single precision. If you're using the exponential weighting scheme, the accuracy for ~50,000 datapoints is about 10^(-3) for both the CPU and GPU variants. For a constant weighting scheme, however, the CPU variant is very inaccurate (off by factors of 100 or more) at 50,000 datapoints, while the GPU variant remains accurate to a factor of 10^(-3).
- Python bindings with Swig
- Addition of Stetson K and L variability indices
- Adding configure script to simplify the install process
Some tests run on an Ubuntu 14.04 desktop with an i7-5930K overclocked to 4.5GHz, and a 980 Ti graphics card. For these
timing tests, CudaStet was compiled with single precision
and using the -O3 optimization flag for g++ and
--use_fast_math for nvcc.
Read as:
N dt
Where N is the number of data points, dt is the execution
time in seconds.
timing : (CPU, EXPONENTIAL)
10 9.000e-06
1009 3.612e-02
2008 9.482e-02
3007 1.367e-01
4006 1.922e-01
5005 2.986e-01
6004 4.134e-01
7003 4.819e-01
8002 6.265e-01
9001 7.888e-01
timing : (GPU, EXPONENTIAL)
10 1.648e-01
1009 1.351e-03
2008 2.490e-03
3007 3.450e-03
4006 4.739e-03
5005 5.822e-03
6004 7.672e-03
7003 8.494e-03
8002 9.271e-03
9001 1.066e-02
timing : (CPU, CONSTANT)
10 1.000e-06
1009 5.004e-03
2008 1.877e-02
3007 3.833e-02
4006 6.095e-02
5005 9.584e-02
6004 1.385e-01
7003 1.869e-01
8002 2.532e-01
9001 2.829e-01
timing : (GPU, CONSTANT)
10 1.740e-04
1009 9.020e-04
2008 1.727e-03
3007 3.200e-03
4006 4.346e-03
5005 4.290e-03
6004 5.602e-03
7003 7.210e-03
8002 7.446e-03
9001 8.685e-03