We are releasing a beta version of BWA with support for Goby format. This version was created from  BWA version 0.5.9 r16 and modified to both read from Goby compact-reads files and write to Goby compact-alignments.

Distribution 

  • goby_latest-cpp.zip – Goby’s latest C/C++ API (release version)
  • bwa-0.5.9-goby-2.3 [released April 4-2013]- Our beta distribution of BWA, based on 0.5.9, with Goby support and autoconf. This version populates the max-occurence field.
  • input.compact-reads – A sample Goby compact-reads file (from a mouse dataset).

Building

It is assumed you are building Goby and BWA on 64-bit Linux. At this time, you will have problems building the Goby C/C++ API on Mac computers.

Instructions for building the Goby’s  C/C++ API are included in the README.txt file.

Once the Goby C/C++ API has been built and installed, building BWA with Goby support can be performed using the following commands:

chmod +x autogen.sh
./autogen.sh
./configure --with-goby
make

Running

Example alignment

  • bwa aln -f alignment.sai BWA_INDEXED_REFERENCE input.compact-reads
  • bwa samse -F goby -f alignment BWA_INDEXED_REFERENCE alignment.sai input.compact-reads

The first line (bwa aln) will perform the alignment of the Goby compact-reads input file  “input.compact-reads” against the pre-built  BWA_INDEXED_REFERENCE reference, writing output to the file “alignment.sai”.

The second line (bwa samse) will output a Goby compact-alignment, creating files that have the prefix “alignment” such as “alignment.entries”, “alignment.stats”, etc. These files can be analyzed using Goby.

Running With Paired Ends

Goby compact-reads can store paired end data within a single file. If you have a Goby compact-reads file that contains both pairs, you could align with the following commands

  • bwa aln -w 0 -f alignment_0.sai BWA_INDEXED_REFERENCE paired-input.compact-reads
  • bwa aln -w 1 -f alignment_1.sai BWA_INDEXED_REFERENCE paired-input.compact-reads
  • bwa sampe -F goby -f alignment BWA_INDEXED_REFERENCE alignment_0.sai alignment_1.sai paired-input.compact-reads paired-input.compact-reads

Important: Please note that paired-end alignment is supported in this beta version, although the implementation may not be final. We are providing this example to discuss how paired-end alignment will be implemented with Goby formats and to request feedback on the command line options.

The normal “bwa sampe” convention is to provide two reads files, one for the queries and one for the pairs. In the example shown above, the input file we’ve provided would contain both the queries and the pairs. When “bwa sampe” tries to read the two input files, it will read the queries from the first file and attempt to read the pairs from the second file (which is, in fact, the same file).

Notes and Limitations

  • When more than one match exists for a read, the read will be annotated as having “Too Many Hits”. The read itself (with multiple matches) will not, otherwise, be stored.

Reading from Goby compact-reads instead of fasta or fastq will be detected automatically.

Additional BWA Command Line Options

We add the following command line flags to  “bwa aln”:

-w INTIf reading from Goby compact-reads, specifies if the primary sequence should be read (0) or if the sequence pair should be read (1) [0]
-x INTIf reading from Goby compact-reads, the start position within the input file, which should be number of bytes into the file to start reading from. The read will actually start at the first record on or after this value. [0, start of file]
-y INTIf reading from Goby compact-reads, the end position within the input file, which should be number of bytes into the file to end reading from. The read will actually end at the end of the record on or after this value. [0, end of file]

We add the following command line flags to  “bwa samse”

-F STRoutput format ‘sam’ or ‘goby’ [sam]
-w INTIf reading from Goby compact-reads, specifies if the primary sequence should be read (0) or if the sequence pair should be read (1) [0]
-x INTIf reading from Goby compact-reads, the start position within the input file, which should be number of bytes into the file to start reading from. The read will actually start at the first record on or after this value. [0, start of file]
-y INTIf reading from Goby compact-reads, the end position within the input file, which should be number of bytes into the file to end reading from. The read will actually end at the end of the record on or after this value. [0, end of file]
We add the following command line flags to  “bwa sampe”
-F STRoutput format ‘sam’ or ‘goby’ [sam]
-x INTIf reading from Goby compact-reads, the start position within the input file, which should be number of bytes into the file to start reading from. The read will actually start at the first record on or after this value. [0, start of file]
-y INTIf reading from Goby compact-reads, the end position within the input file, which should be number of bytes into the file to end reading from. The read will actually end at the end of the record on or after this value. [0, end of file]