Trans-ABySS 1.0.1 (Nov 03, 2010)

This release includes updated filtering prior to assembly merging (assembly.py), an updated model_matcher.py, numerous other feature improvements and bug fixes.

For additional information about this project, please visit the overview page .

Available downloads

trans-ABySS-v1.0.1.tar.gz

For all platforms (122.9 MB)

Release Notes

State Final release
License BCCA (academic use)


IMPORTANT UPDATE:
There is a bug in "utilities/reads_to_contigs.py" and has been fixed here: reads_to_contigs.py
Please download this file and replace the one in the release package.

 

User Manual for trans-ABySS v1.0.1

PolyAscripts Demo Walkthrough (illustrating the usage details of polyascripts)

Change log

1. analysis/fusion.py:
- [FEATURE] added in -P option for skipping the use of genomic read-pair information in screening fusions
- [FEATURE] more generic way of parsing first field of fusion output file when filtering (to accommodate non-ABySS contig names, also see the change in "analysis/ensembly.py" below)

2. analysis/gene_coverage.py:
- [BUG] should have stripped "\n" of gene-to-transcript mapping when parsing gene-to-transcript .map file
- [FEATURE] will use "gene" column in "coverage.txt" output to group contigs to genes for calculating gene coverage (default behaviour); user can still provide transcript-to-gene conversion file if desired.

3. analysis/ensembly.py:
- [BUG] should have used 3 ('ens') instead of 4 characters in testing data line of Ensembl annotations
- [FEATURE] get_seq() has been added to extract sequences for contig objects
- [FEATURE] all 1-in-1-out junction contigs with lengths between k and 2k-2 are reported i.e. those that do not pass PopBubbles will also be reported.  But now there is an upper length limit for junction contigs: 2k-2 bp. No need to use PopBubbles any more.
- [FEATURE] able to handle non-ABySS fasta header (i.e. just contig id in header) so that contigs from other assemblers than ABySS can be processed.

4. analysis/model_matcher.py:
- [FEATURE]
    a. outputs mapping results on the fly, reduced memory usage
    b. uses config file for location of annotation data; removed custom config file option, must use 'model_matcher.cfg' in 'configs' folder
    c. default model usage in config file is used unless specified by user; no individual model switches are provided, instead use -m for specifying individual models
    d. outputs coverage by default (removed -z option);
    e. coverage file outputs gene symbol in second column, header description added
    f. added second required argument 'genome' (this is for locating data files in the 'annotations' folder)
    g. '-R' becomes '-r' for specifying reference genome, and becomes an on/off switch (it will look for "genome.fa" inside annotations folder if '-r' specified)
    h. '-S' becomes '-f' for specifying the contigs file
    i. no need to specify 'path' and 'splice_file' in 'model_matcher.cfg'
    j. "coverage.txt" output will report "transcript_id" and "gene" on first and second columns respectively
- [BUG]
    a. some novel_utr cases that should have been reported will now be reported. Should have used "<=" at checking matching block index
    b. Blat alignments blocks that are actually contiguous but are interrupted by an insertion are now merged together. This should prevent reporting some incorrect novel transcript events.
    c. novel_exon artefacts are fixed up so that if novel exons <= 10 bp flanked by novel splice sites are treated as artefacts 

5. annotations/xx/splice_motives.txt: must be provided, can be symbolic link to 'shared/splice_motives.txt'

6. utitlies/align_parser.py:
- [BUG] get rid of GSC specific default for splice motives file; use config to specify path to Biopython instead of hard-coding
- [BUG] remove output files if append mode is used

7. utilties/intspan.py:
- [FEATURE] added merge_blocks() for model_matcher.py

8. utilities/bam.py:
- [BUG] use config file to locate pysam.py

9. utilities/binaries.cfg:
- [BUG] path to pysam is now required

10. configs/projects.cfg:
- [FEATURE] updated the template for running "model_matcher.py"

11. utilities/reads_to_contigs.py:
- [FEATURE] allow read files to be specified as relative pathnames in the "in" file in ABySS assembly folder

12. new files in utilities/:
    - run-abyss: example shell script for ABySS multi-k run (on a local desktop)
    - "qsub-l50-64" and "qsub-l50-k64": example qsub scripts for ABySS multi-k cluster run
    - facN: script that generates assembly statistics

13. Perl wrappers (in "wrappers" folder) updated:
    - setup.pl:
    [BUG] the library-processing order when "-start" and "-num" options are used was corrected to be the same order as the libraries are listed in the input file
    [FEATURE] auto generate assembly statistics for each filtered k-assembly as well as the merged assembly
    - align.pl: [FEATURE] updated docs
    - analyze.pl: [FEATURE] "run_coverage" updated to skip gene_transcript_map file

14. polyascripts are updated and sample data is included in "analysis/polyascripts" folder (eg_data.05.tar.gz)

15. utilities/check_complete_blat.pl & utilities/cluster_align.py:
- [FEATURE] updated to also compare input/*.fa files with the merge/LIB-contigs.fa file