A cohorts parameter was added to pca but it is a bit confusing.
Most functions using a cohorts parameter, such as the diversity_stats which only works at the cohort level or snp_allele_frequencies which needs to split the data into cohorts, have a different use for cohorts than pca which uses it to downsample the data. Arguably, because the 2 parameters are semantically different, they should have different definitions and the current cohorts parameter works fine if it is set to a column name and only really fails if given a dictionary (or any other mapping, I would guess).
There are two options:
- change the
cohorts parameter to be more restrictive
- change the code to cope with the more general definition of
cohorts
A
cohortsparameter was added topcabut it is a bit confusing.Most functions using a
cohortsparameter, such as thediversity_statswhich only works at the cohort level orsnp_allele_frequencieswhich needs to split the data into cohorts, have a different use forcohortsthanpcawhich uses it to downsample the data. Arguably, because the 2 parameters are semantically different, they should have different definitions and the currentcohortsparameter works fine if it is set to a column name and only really fails if given a dictionary (or any other mapping, I would guess).There are two options:
cohortsparameter to be more restrictivecohorts