Nyasha will be presenting a high-level overview of Goby at the Bioinformatics Open Source Conference (BOSC) 2010 in Boston.  The talk is early afternoon around 3PM on July 9th.

We released Goby 1.7, which includes many new features and performance enhancements. See the Goby ChangeLog for a complete list, but here are the most significant improvements:

  • Support for sorting and indexing Goby alignments.
  • Draft support for paired sequence runs in compact format, fasta-to-compact and compact-to-fasta modes.
  • Support to estimate read weights described in Hansen KD et al NAR 2010. See http://campagnelab.org/software/goby/tutorials/estimate-heptamer-weights/ In contrast to the initial publication, Goby supports using the weights to reweight annotation counts (gene/exon/other) and transcript counts.
  • Preliminary support for barcoded reads (barcodes in the sequence), see new mode decode-barcodes (and tutorial online at http://campagnelab.org/software/goby/tutorials/handling-barcoded-reads/).
  • Dramatically improved performance for differential expression tests with millions of differentially expressed elements (e.g., exon+gene+other). The code previously incorrectly grew internal arrays from zero to the number of new DE element described in the annotation file.

We have extended IGV to load and display Goby alignments. This feature will be incorporated in the Integrative Genome Viewer (from the Broad Institute) in a forthcoming 1.5.x version. This tutorial provides a preview. We expect to release Goby 1.7 and the version of IGV that loads Goby alignments before BOSC 2010.

We’ve just posted a programming tutorial that illustrates how to write programs with the Goby framework. This first tutorial describes how to parse reads and alignments in compact format. It provides background about how Goby builds on Protocol Buffer for managing large data files.

Goby 1.7 will support sorting and indexing alignments. Sorting arranges alignment entries by genomic location order. Indexing provides semi-random access to locations in a sorted file. The features provide large performance improvements when software needs to access only a specific window of genomic location in a very large alignment file (e.g., tens or hundreds of gigabytes).

Many Goby modes will benefit from this feature transparently. For instance, modes that allow the user to restrict an analysis to a subset of reference sequences (i.e., modes with the -r/–include-reference-names option), use the index to load only the part of the alignment that align within the regions of interest. This results in very large speed-ups (>9x) for analyses that need to process one chromosome at a time in a large alignment file.

End users can sort alignments with the new sort mode (see this tutorial). Concatenating a set of sorted alignments with yield a globally sorted alignment (as usual concatenating alignments is done with mode concatenate-alignments, which detects sorted inputs automatically). A sorted alignment is automatically indexed when written to disk.

Developers should check out the new skipTo(targetIndex, position) method on edu.cornell.med.icb.goby.alignments.AlignmentReader, as well as the new version of edu.cornell.med.icb.goby.alignments.IterateAlignments, which leverages skipTo for indexed alignments (see the new IterateAlignments tutorial).

The next version of Goby will support reweighting reads with the method recently described by Hansen KD et al NAR 2010. Counts can be reweighted before producing wiggle and bedgraph plots, but Goby also supports reweighting reads when estimating gene expression. See our tutorial for a preview of this upcoming feature.

The next version of Goby will support sequence variations. We have added a simple way to represent mutations, insertions and deletions in the compact alignment format. We have also developed parsers to import variations from SAM format and the MAF format used by Last. Finally, new modes will support converting sequence variations from compact format to summary statistics (e.g., frequency of variation along the read positions) or to tab delimited files for analysis with other software packages.

We are interested in collaborating with groups who are developing statistical models for SNP and indel calling to develop efficient and accurate variant calling solutions. Please let us know if this would be of interest.

We are now writing the manuscript that describes the design principles and solutions implemented in Goby. This manuscript should include storage and performance benchmarks for the compact format.

We’re working on a new lab web site. Stay tuned.

This site is protected by Comment SPAM Wiper.

Page optimized by WP Minify WordPress Plugin