Skip to content. | Skip to navigation

Personal tools
Log in

Navigation

You are here: Home / Platforms / Scientific Publications / ntCard: A streaming algorithm for cardinality estimation in genomics data.
Home Platforms Scientific Publications ntCard: A streaming algorithm for cardinality estimation in genomics data.

ntCard: A streaming algorithm for cardinality estimation in genomics data.

Authors Hamid Mohamadi, Hamza Khan & Inanc Birol
Abstract Many bioinformatics algorithms are designed for the analysis of sequences of some uniform length, conventionally referred to as k-mers. These include de Bruijn graph assembly methods and sequence alignment tools. An efficient algorithm to enumerate the number of unique k-mers, or even better, to build a histogram of k-mer frequencies would be desirable for these tools and their downstream analysis pipelines. Among other applications, estimated frequencies can be used to predict genome sizes, measure sequencing error rates, and tune runtime parameters for analysis tools. However, calculating a k-mer histogram from large volumes of sequencing data is a challenging task.
Journal Name and Citation

Bioinformatics. 2017 Jan 5. pii: btw832. doi: 10.1093/bioinformatics/btw832. [Epub ahead of print]

Date of Publication 2017/01/05
Publication Link https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btw832