Goby 1.7 will support sorting and indexing alignments. Sorting arranges alignment entries by genomic location order. Indexing provides semi-random access to locations in a sorted file. The features provide large performance improvements when software needs to access only a specific window of genomic location in a very large alignment file (e.g., tens or hundreds of gigabytes).
Many Goby modes will benefit from this feature transparently. For instance, modes that allow the user to restrict an analysis to a subset of reference sequences (i.e., modes with the -r/–include-reference-names option), use the index to load only the part of the alignment that align within the regions of interest. This results in very large speed-ups (>9x) for analyses that need to process one chromosome at a time in a large alignment file.
End users can sort alignments with the new sort mode (see this tutorial). Concatenating a set of sorted alignments with yield a globally sorted alignment (as usual concatenating alignments is done with mode concatenate-alignments, which detects sorted inputs automatically). A sorted alignment is automatically indexed when written to disk.
Developers should check out the new skipTo(targetIndex, position) method on edu.cornell.med.icb.goby.alignments.AlignmentReader, as well as the new version of edu.cornell.med.icb.goby.alignments.IterateAlignments, which leverages skipTo for indexed alignments (see the new IterateAlignments tutorial).