Skip to content

New feature request: ANI distance #9

@cumbof

Description

@cumbof

Hi @rsharris
this is not actually an an issue, but a request to add a new feature in HowDeSBT if possible.

I’m using your software as a dependency of MetaSBT where I’m building SBTs starting from genomes, microbial genomes in particular.
It would be very useful if HowDeSBT could provide a way to compute the Average Nucleotide Identity (ANI) measure between two bloom filters with the bfdistance subcommand.

It would be very useful if it could also compute the ANI between filters while running a query. Let’s say that I have an SBT built with genomes. Now, I have a new genome (its bloom filter representation) and I want to establish which is the closest one in the tree according to the ANI distance.

Computing the ANI usually means performing alignment, but we could use the following formula as a very good estimation of ANI based on the number of active bits in the union and intersection between two bloom filters:

1 - (1 + (1/kmer_size) * log((2*jaccard_index) / (1+jaccard_index)))

Where kmer_size is obviously the size of the kmers used to build the bloom filters, and the jaccard_index is simply the number of active bits in the intersection of two bloom filters over the number of active bits at the union of the same two bloom filters intersection/union.

I’m currently computing this measure within MetaSBT by running the bfdistance subcommand twice with --show:intersect and --show:union. It works as expected, but of course it’s very inefficient. It would be much faster if HowDeSBT could provide this feature natively.

Let me know if it makes sense and if there is any chance we could see this feature implemented in the near future.
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions