-
Notifications
You must be signed in to change notification settings - Fork 2
Calculate allele sharing distances
License
szpiech/asd
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
**Be aware that VCF files must contain GTs only and no other INFO for each sample and variant site otherwise GTs will be read incorrectly. This is a bug that will be fixed eventually. You can ensure proper loading of your vcf file by running: bcftools annotate -x FMT in.vcf -Ov -o out.vcf until this is fixed.
asd is a multi-threaded program to quickly compute pairwise allele sharing distances, GRMs, or frequency weighted allele similarities between diploid individuals with an arbitrary number of alleles per locus (GRMs biallelic only). asd reads TPED/TFAM, STRU, and VCF formatted files and will load gzipped files without first requiring decompression. If your data are not biallelic, you must use the --multiallelic flag. This flag is not compatible with --grm and is automatically set when using --weighted.
usage
./asd --stru <filename> --id-col <column where individual names are>
or
./asd --tped <filename>.tped --tfam <filename>.tfam
or
./asd --vcf <filename>.vcf
optional flags --partial, --log, --ibs-all, --out, --missing-stru, --missing-tped, --biallelic, --combine, --long, --long-ibs, --grm, --weighted
CHANGE LOG
----------
18 FEB 2020 - Release of v1.1.0a fixes a bug when reading VCF without --multiallelic set.
14 FEB 2020 - Release of v1.1.0 includes support for VCF files and computing the weighted allele similarity score from Greenbaum et al. 2019 Genome Research 29:2020-2033.
asd v1.1.0a
----------Command Line Arguments----------
--combine <string1> ... <stringN>: Combine several files
generated with --partial.
Default: __none
--grm <bool>: Calculate the genomic relationship matrix.
Default: false
--help <bool>: Prints this help dialog.
Default: false
--ibs <bool>: Set to output all IBS sharing matrices too.
Default: false
--id <int>: Column where individual IDs are. (stru files only)
Default: 1
--log <bool>: Transforms allele sharing distance by -ln(ps) instead of 1-(ps).
Default: false
--long <bool>: Instead of printing a matrix, print allele sharing distances one
per row. Formatted <ID1> <ID2> <allele sharing distance>.
Not compatible with --partial.
Default: false
--long-ibs <bool>: Instead of printing a matrix, print IBS calculations one
per row. Formatted <ID1> <ID2> <IBS0/1/2>.
Not compatible with --partial.
Default: false
--missing-stru <string>: For stru files, set the missing data value.
Default: -9
--missing-tped <string>: For tped files, set the missing data value.
Default: 0
--multiallelic <bool>: Set if there are more than two alleles at at least one locus.
Computations are less efficient with this flag set.
Default: false
--out <string>: Basename for output files.
Default: outfile
--partial <bool>: If set, outputs two nind x nind matrices.
The first is the number of loci used for each pairwise
comparison, and the second is the untransformed dist/IBS
matrix. Useful for splitting up very large jobs.
Can combine with --combine.
Default: false
--stru <string>: The input data filename (stru format).
Change missing data code with --missing-stru.
Requires two header rows: locus names, and locus positions.
Default: __none
--tfam <string>: The input data filename (tfam format).
Requires a .tped file (--tped).
Default: __none
--threads <int>: Number of threads to spawn for faster calculation.
Default: 1
--tped <string>: The input data filename (tped format).
Requires a .tfam file (--tfam).
Change missing data code with --missing-tped.
Default: __none
--vcf <string>: The input data filename (vcf format).
Default: __none
--weighted <bool>: Calculate weighted allele sharing similarity scores from Greenbaum et al. 2019.
Default: false
NOTE: Multi-dimensional scaling plots can be created easily in R with the following commands (assuming --partial was not used):
data<-read.table("<outfile>.asd.dist",header=TRUE)
data.mds<-cmdscale(data)
plot(data.mds)
A toy example with no missing data:
./asd --stru test_data_1.stru --id-col 1
output:
ind0 ind1 ind2 ind3 ind4 ind5
ind0 0 0.5 0.3 0.2 0.3 0.5
ind1 0.5 0 0.2 0.3 0.4 0.8
ind2 0.3 0.2 0 0.1 0.2 0.6
ind3 0.2 0.3 0.1 0 0.3 0.5
ind4 0.3 0.4 0.2 0.3 0 0.6
ind5 0.5 0.8 0.6 0.5 0.6 0
./asd --stru test_data_1.stru --id-col 1 --partial
output:
6
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
5 5 5 5 5 5
ind0 ind1 ind2 ind3 ind4 ind5
ind0 5 2.5 3.5 4 3.5 2.5
ind1 2.5 5 4 3.5 3 1
ind2 3.5 4 5 4.5 4 2
ind3 4 3.5 4.5 5 3.5 2.5
ind4 3.5 3 4 3.5 5 2
ind5 2.5 1 2 2.5 2 5
A toy example with missing data:
./asd --stru test_data_2.stru --id-col 1
output:
ind0 ind1 ind2 ind3 ind4 ind5
ind0 0 0.5 0.25 0.125 0.333333 0.166667
ind1 0.5 0 0.125 0.25 0.333333 0.666667
ind2 0.25 0.125 0 0.1 0.125 0.5
ind3 0.125 0.25 0.1 0 0.25 0.375
ind4 0.333333 0.333333 0.125 0.25 0 0.625
ind5 0.166667 0.666667 0.5 0.375 0.625 0
./asd --stru test_data_2.stru --id-col 1 --partial
output:
6
4 3 4 4 3 3
3 4 4 4 3 3
4 4 5 5 4 4
4 4 5 5 4 4
3 3 4 4 4 4
3 3 4 4 4 4
ind0 ind1 ind2 ind3 ind4 ind5
ind0 4 1.5 3 3.5 2 2.5
ind1 1.5 4 3.5 3 2 1
ind2 3 3.5 5 4.5 3.5 2
ind3 3.5 3 4.5 5 3 2.5
ind4 2 2 3.5 3 4 1.5
ind5 2.5 1 2 2.5 1.5 4
About
Calculate allele sharing distances
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published