Convert a BAM file to Goby alignment formats

Assuming you have generated a SAM file called input.sam by mapping 500 reads to some reference sequence, the following command will convert input.sam to Goby format:

goby 1g sam-to-compact  input.sam -o goby-output-basename  [–sorted]

Options are as follow:

input.sam is used to indicate a SAM input file. A bam format will be expected if the file ends in .bam

-o indicates the Goby basename to use when writting the converted alignment

–sorted indicates that the SAM/BAM file is sorted (the Goby file will also be sorted).

Note that the input SAM/BAM must contain MD attributes. You can introduce missing MD tags with the samtools calmd tool.

Convert back to BAM (new in Goby 2.0)

You can export Goby alignments back to SAM/BAM format:

goby 1g compact-to-sam  goby-basename  -o output.bam (-g|–genome) <input-genome>

If you specify an output file with a .sam extension, Goby will write the output in SAM format instead of BAM. Note that you need to specify the genome that the alignment was done against. Goby accepts either fasta files indexed with samtools faidx or the Goby sequence cache format. For instance,

$ samtools faidx genome.fa

will yield genome.fa.fai, which you can use as follows:

$ goby 1g compact-to-sam  goby-basename  -o output.bam –genome genome.fa.fai


$ goby 3g build-sequence-cache genome.fa -b genome-basename

will yield genome-basename.*, which you can use as follows:

$ goby 1g compact-to-sam  goby-basename  -o output.bam –genome genome-basename

Extract reads and quality scores from a BAM file

Goby provides a utility to convert a BAM file directly to compact-reads format.

goby 1g sam-extract-reads input.bam -o output ‐‐quality-encoding BAM

The previous command will scan the file input.bam, extract reads and quality scores and write the file output.compact-reads. An optional quality-encoding argument indicates if quality scores are encoded as expected in a BAM file (as Phred score?) or as another quality encoding scheme (Illumina, Solexa and Sanger are supported).