Skip to content

Understanding CSFS Samples and Discretization in ASMC's Prepare Decoding Tool #13

@7JVST

Description

@7JVST

Hi there,

I'm using the C++ compiled version of Prepare Decoding to create decoding quantities files for fastSMC, focusing on analyzing IBD segments. I've got demo files from ASMC_data and frequency files made from my own dataset including 1600 samples and around 500,000 variants. I used disc file from the one included in package "input_30_-100-2000.disc".

When I tried setting 'CSFSsamples=1600' to match the sample count, I ran into a memory issue causing a core dump. However, lowering 'CSFSsamples' to 300 fixed the problem.

I'm curious about the actual meaning of 'CSFS samples' counts. Do they need to match the sample count in the frequency file or the '.haps', '.samples', and '.map' files which will be used in fastSMC analysis later (n = 1600)? Also, is there a maximum limit for 'CSFS samples' counts?

Additionally, I'd like to know how to define my own number of quantiles for discretization in the C++ version. I noticed Python version allows user to define discretization like this: discretization=[[30.0, 15], [100.0, 15], 39]. Can you tell me how to do this in the C++ version?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions