ABySS

Assembly By Short Sequences - a de novo, parallel, paired-end sequence assembler

Project Description

ABySS

ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.

To assemble transcriptome data, see Trans-ABySS.

Awards

June 2015, 12th [BC]2 Conference in Basel, Switzerland: ABySS was the winner of the Swiss Institute of Bioinformatics’ inaugural International Bioinformatics Resource Award.  Read more....

Publications

  • ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Jackman SD, Vandervalk BP, Mohamadi H, Chu J, Yeo S, Hammond SA, Jahesh G, Khan H, Coombe L, Warren RL, Birol I. Genome Research, 2017 27: 768-777. (Genome ResearchPubMed)

  • ABySS: A parallel assembler for short read sequence data. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. Genome Research, 2009-June. (Genome Research, PubMed)

  • De novo Transcriptome Assembly with ABySS. İnanç Birol, Shaun D Jackman, Cydney Nielsen, Jenny Q Qian, Richard Varhol, Greg Stazyk, Ryan D Morin, Yongjun Zhao, Martin Hirst, Jacqueline E Schein, Doug E Horsman, Joseph M Connors, Randy D Gascoyne, Marco A Marra and Steven JM Jones. Bioinformatics. 2009-June. (Bioinformatics Advance Access)
  • De novo assembly and analysis of RNA-seq data. Gordon Robertson, Jacqueline Schein, Readman Chiu, Richard Corbett, Matthew Field, Shaun D Jackman, Karen Mungall, Sam Lee, Hisanaga Mark Okada, Jenny Q Qian, Malachi Griffith, Anthony Raymond, Nina Thiessen, Timothee Cezard, Yaron S Butterfield, Richard Newsome, Simon K Chan, Rong She, Richard Varhol, Baljit Kamoh, Anna-Liisa Prabhu, Angela Tam, YongJun Zhao, Richard A Moore, Martin Hirst, Marco A Marra, Steven J M Jones, Pamela A Hoodless Marco A Marra, Steven J M Jones, Pamela A Hoodless and İnanç Birol. Nature Methods. 2010-Oct. (Nature)

Current Release
ABySS 2.1.5

Released Dec 04, 2018

Compiler fixes and increase stack size limits to avoid stack overflows.
More about this release…

Download file Get ABySS for all platforms
Source code

All Releases

Version Released Description Compatibility Licenses Status
2.1.5 Dec 04, 2018 Compiler fixes and increase stack size limits to avoid stack overflows. More about this release… GPLv3 final
2.1.4 Nov 09, 2018 This release provides major improvements to Bloom filter assembly contiguity and correctness. Bloom filter assemblies now have equivalent scaffold contiguity and better correctness than MPI assemblies of the same data, while still requiring less than 1/10th of the memory. On human, Bloom filter assembly times are still a few hours longer than MPI assemblies (e.g. 17 hours vs. 13 hours, using 48 threads). More about this release… GPLv3 final
2.1.3 Nov 05, 2018 This release fixes a SAM-formatting bug that broke the ABySS-LR pipeline (Tigmint/ARCS). More about this release… GPLv3 final
2.1.2 Oct 24, 2018 This release improves scaffold N50 on human by ~10%, due to implementation of a new `--median` option for `DistanceEst` (thanks to @lcoombe!). This release also adds a new `--max-cost` option for `konnector` and `abyss-sealer` that curbs indeterminately long running times, particularly at low k values. More about this release… GPLv3 final
2.1.1 Sep 11, 2018 This release provides bug fixes and modest improvements to Bloom filter assembly contiguity/correctness. Parallelization of Sealer has also been improved, thanks to contributions by @schutzekatze. More about this release… GPLv3 final
2.1.0 Apr 13, 2018 This release adds support for misassembly correction and scaffolding using linked reads, using Tigmint and ARCS. (Tigmint and ARCS must be installed separately.) In addition, simultaneous optimization of `s` (seed length) and `n` (min supporting read pairs / Chromium barcodes) is now supported during scaffolding. More about this release… GPLv3 final
2.0.3 Mar 14, 2018 This minor release provides bug fixes and improved reliability for both MPI assemblies and Bloom filter assemblies on large datasets. In addition, many usability improvements have been made to the `abyss-samtobreak` program for misasssembly assessment. More about this release… GPLv3 final
2.0.2 Oct 21, 2016 Fix compile errors with gcc-6 and boost-1.62. More about this release… GPLv3 final
2.0.1 Sep 14, 2016 This release resolves some licensing issues with that were pointed out in 2.0.0. As of 2.0.1, ABySS is now available under a standard GPL-3 license, and the libraries included under `lib/rolling-hash` and `lib/bloomfilter` are now also licensed under GPL-3. For alternative licensing terms, please contact Patrick Rebstein (prebstein at bccancer.bc.ca). More about this release… GPLv3 final
2.0.0 Sep 01, 2016 This release introduces a new Bloom filter assembly mode that enables large genome assemblies with minimal memory (e.g. 34 GB for H. sapiens with 76X coverage bfc-corrected reads). Bloom filter assemblies are currently less contiguous than the default (MPI) assembly mode but are still of high quality (e.g. 3.5 Mbp vs. 4.8 Mbp scaffold NG50 for H. sapiens). Bloom filter assembly mode is enabled by adding three 'abyss-pe' parameters (B = *Bloom filter size*, H = *number of Bloom filter hash functions*, kc = *k-mer coverage threshold*). See 'README.md' for an example. This release also updates several 'abyss-pe' parameter defaults to be more suitable for large genome assemblies with recent Illumina data. In addition, ABySS 2.0.0 includes minor usability improvements for 'abyss-sealer' and removes an unnecessary build dependency on sqlite3. More about this release… BCCA (academic use) final
1.9.0 May 29, 2015 This release introduces a new paired de Bruijn graph mode for assembly. In paired de Bruijn graph mode, ordinary k-mers are replaced by k-mer pairs, where each k-mer pair is separated by a fixed-size gap. The primary advantage of paired de Bruijn graph mode is that the span of a k-mer pair can be arbitrarily wide without consuming additional memory, and thus provides improved scalability for assemblies of long sequencing reads. This release also introduces a new tool called Sealer for closing scaffold gaps, new Konnector functionality for producing long pseudo-reads, and support for the DIDA (Distributed Indexing Disptached Alignment) parallel alignment framework. More about this release… BCCA (academic use) final
1.5.2 Jul 10, 2014 In this release we introduce Konnector, a fast and memory-efficient tool to fill the gap between paired-end reads. Konnector determines the intervening sequence by building a Bloom filter de Bruijn graph and searching for paths between paired-end reads within the graph. A companion tool called abyss-bloom is also provided which can be used to construct reusable bloom filter files for input to Konnector; otherwise, Konnector will build an in-memory Bloom filter for one-time use. In addition to Konnector, we have fixed bugs related to compiling with GCC 4.8+ and parsing BWA output SAM files. More about this release… GPLv3 for non-commercial usage final
1.5.1 May 08, 2014 In this release we fix a compatibility issue with Trans-ABySS 1.5.0 where the output of abyss-filtergraph is not strand-specific. Also, we include additional FCC portability fixes. More about this release… GPLv3 for non-commercial usage final
1.5.0 May 01, 2014 In this release we have added full strand specific RNA-Seq support such that output contigs are correctly oriented with respect to the original transcripts sequenced. Also, there are new parameters to abyss-pe, xtip and Q, that are used to improve assembly in high coverage regions like highly expressed transcripts. Setting xtip=1 will more aggressively remove certain tips. The 'Q' parameter will prevent low quality bases from being used in the assembly. The version has been bumped to 1.5.0 to signify compatibility with Trans-ABySS 1.5.0. More about this release… GPLv3 for non-commercial usage final
1.3.7 Dec 11, 2013 Scaffolds can now be rescaffolded using long sequences such as RNA-Seq assemblies produced from Trans-ABySS. Added support for gcc 4.8+ and Mac OS X 10.9 Mavericks with clang. Finally, we've licensed ABySS under GPL for non-commercial purposes. Please read the LICENSE file for more details. More about this release… GPLv3 for non-commercial usage final
1.3.6 Jul 31, 2013 ABYSS and ABYSS-P are now ~20% faster. Fixed many portability issues and bugs, and improved some error messages. More about this release… BCCA (academic use) final
1.3.5 Mar 05, 2013 This release introduces new tools to merge overlapping read pairs, layout and merge contigs with perfect sequence overlap, and calculate contig contiguity and correctness metrics. Also, it includes updates to the existing documentation, bug fixes, and attempts to fill scaffold gaps with a consensus of all paths between contigs. More about this release… BCCA (academic use) final
1.3.4 May 30, 2012 This release eliminates two sources of misassemblies, one in the path extension logic of SimpleGraph. Two, the default value of m, which is the minimum overlap required between two contigs to merge them, is increased from 30 to 50. This release also fixes various portability issues. A new script, abyss-fatoagp, is included to create an AGP file for GenBank submission. More about this release… BCCA (academic use) final
1.3.3 Mar 13, 2012 Specify the minimum alignment length when aligning the reads to the contigs with the parameter l. Improve the scaffolding algorithm that identifies repeats. Improve the documentation. More about this release… BCCA (academic use) final
1.3.2 Dec 13, 2011 Improve distance estimates between contigs, enable scaffolding by default, and remove small shim contigs that don't add useful sequence to the assembly. The default aligner is abyss-map. MergePaths uses a non-greedy algorithm that reduces sequence duplication but may reduce contiguity. More about this release… BCCA (academic use) final
1.3.1 Oct 24, 2011 Fix a bug in KAligner and fix a compiler error for Mac OS X. More about this release… BCCA (academic use) final
1.3.0 Sep 09, 2011 Mate-pair data can be used to scaffold contigs. Specify your mate-pair libraries using the `mp' parameter of abyss-pe. More about this release… BCCA (academic use) final
1.2.7 Apr 15, 2011 Support using bwa or bowtie to align reads to contigs. New parameter, d, to specify the acceptable error of a distance estimate. More about this release… BCCA (academic use) final
1.2.6 Feb 07, 2011 Sequence variants are popped if the two variants are at least 90% similar. Contigs that overlap by fewer than k-1 bp are found and may be merged. More about this release… BCCA (academic use) final
1.2.5 Nov 15, 2010 Fix a colour-space-specific bug and a bug causing the error Assertion `fstSol.size() == 1' failed. More about this release… BCCA (academic use) final
1.2.4 Oct 14, 2010 Replace gaps of Ns that span a region of ambiguous sequence with a consensus sequence of the possible sequences that fill the gap. The consensus sequence uses IUPAC-IUB ambiguity codes. More about this release… BCCA (academic use) final
1.2.3 Sep 08, 2010 Fix two bugs that caused the error messages: Assertion `m_comm.receiveEmpty()' failed. and error: unexpected ID More about this release… BCCA (academic use) final
1.2.2 Aug 25, 2010 Merge contigs after popping bubbles. Handle multi-line FASTA sequences. Report the amount of memory used. More about this release… BCCA (academic use) final
1.2.1 Jul 12, 2010 Handle mate pair libraries with reverse-forward orientation as produced by circular, large-fragment libraries. Distance estimates are improved. More about this release… BCCA (academic use) final
1.2.0 May 26, 2010 Scaffold over gaps in coverage and unresolved repeats. Read sequence from SAM and BAM files. Set q=3 by default. Set E=0 when coverage is low (<2). Generate a Graphviz dot file of the paired-end assembly. More about this release… BCCA (academic use) final
1.1.2 Feb 15, 2010 Pop bubbles resulting from indels. Read tar files. Fix performance issues in ParseAligns by syncing KAligner threads periodically. More about this release… BCCA (academic use) final
1.1.1 Jan 19, 2010 Pop complex bubbles either completely or not at all. Choose better (typically lower) default values for the parameters e and c. More about this release… AFL final
1.1.0 Dec 21, 2009 ABySS will expand tandem repeats when it is possible to determine the exact number of the repeat. The paired-end path-finding algorithm, SimpleGraph, is multithreaded. Fixed a bug in MergePaths that could misassemble repeats larger than the paired-end fragment size. The output format of AdjList, DistanceEst and SimpleGraph has changed. More about this release… AFL final
1.0.9 May 15, 2009 Significantly reduce the memory usage of KAligner and ParseAligns. abyss-pe can read multiple input files and read FASTA or FASTQ files. More about this release… AFL final
1.0.8 Apr 02, 2009 Fix the bug causing the error Assertion `marked == split' failed. More about this release… AFL final
1.0.7 Mar 31, 2009 The parallel MPI assembler is now deterministic; it will produce the same result every time. More about this release… AFL final
1.0.6 Mar 25, 2009 Fix a race condition in the erosion algorithm. More about this release… AFL final
1.0.5 Mar 11, 2009 Portability fixes. More about this release… AFL final
1.0.4 Mar 09, 2009 Remove the need to specify the parameters -e,--erode and -b,--bubbles. Use less disk space by using pipes to avoid intermediate files. Many improvements to the paired-end algorithm. More about this release… BCCA (academic use) final
1.0.3 Feb 05, 2009 Tidy up the ends of blunt contigs. Merge blunt contigs that are connected by pairs and overlap. More about this release… BCCA (academic use) final
1.0.2 Nov 21, 2008 Include a parallel binary compiled for OpenMPI. More about this release… BCCA (academic use) final
1.0.16 Nov 13, 2009 Improve the performance and memory usage of KAligner and AdjList, particularly for very large data sets. More about this release… AFL final
1.0.15 Oct 19, 2009 New parameters, e and E, to set the coverage threshold of the erosion algorithm. Values for the parameters e and the coverage threshold, c, will be chosen automatically if they're not specified. The read length is now an optional parameter. Two important bug fixes, see below. More about this release… AFL final
1.0.14 Sep 08, 2009 Assemble multiple libraries of different fragment sizes. More about this release… AFL final
1.0.13 Aug 26, 2009 Read files compressed with gzip (.gz) or bzip2 (.bz2). More about this release… AFL final
1.0.12 Aug 19, 2009 Both ABYSS and KAligner are run only once per assembly, which speeds up the paired-end assembly stage by nearly a factor of two. The k-mer coverage information is correct in every contig file. A tool is included to convert colour-space contigs to nucleotide contigs. Discard reads that fail the chastity filter. More about this release… AFL final
1.0.11 Jul 21, 2009 Assemble colour-space reads. Read files in qseq format. KAligner is multithreaded. Integrate with Sun Grid Engine (SGE). Prevent misassemblies mediated by tandem segmental duplications. More about this release… AFL final
1.0.10 Jun 18, 2009 ParseAligns is improved to handle any number of reads as long as mate pairs are found interleaved in the same file. Merge overlapping paired-end contigs that were previously being missed in some situations. Number paired-end contigs so that their IDs do not overlap with the single-end contigs. More about this release… AFL final
1.0 Aug 07, 2008 Initial version of abyss. More about this release… BCCA (academic use) final