The following steps demonstrate how Goby handles the simple data analysis task of aligning RNA-Seq reads to a reference and generating a wiggle track for that alignment.
For this brief example, we use the sample dataset included with the Goby distribution and chromosome chr1 from the UCSC MM9 assembly.
The Goby compact format makes it possible to parallelize the alignment process efficiently. However, since most aligners also accept Fastq format directly, if you are not seeking to parallelize the alignment, you can just use the aligner directly in the following (shown with BWA in 1a):
1a. Aligning with FASTA/FASTQ to Goby alignment format:
bwa index data/reference/mm9/chr1.fa.gz bwa aln -t 16 -f alignment.sai data/reference/mm9/chr1.fa.gz data/reads/goby-mouse-reads-sample.fasta.gz bwa samse -F goby -f goby-sample data/reference/mm9/chr1.fa.gz alignment.sai data/reads/goby-mouse-reads-sample.fasta.gz
In the bwa samse step above, the -F argument instructs BWA to write the alignment in the Goby format. This produces the following files:
-rw-r--r-- 1 53 goby-sample.header -rw-r--r-- 1 12K goby-sample.tmh -rw-r--r-- 1 181 goby-sample.stats -rw-r--r-- 1 791K goby-sample.entries
Together, these files constitute a Goby compact alignment. Goby compact alignments can be analyzed efficiently with many tools offered in the Goby toolbox. They can be visualized with the early access version of IGV 2.0.
1b. Aligning with Goby compact reads to Goby alignment format:
FASTA/FASTQ input data files can be converted to the Goby compact format:
java -Xmx3g -jar goby.jar --mode fasta-to-compact data/reads/goby-mouse-reads-sample.fasta.gz
bwa index data/reference/mm9/chr1.fa.gz bwa aln -t 16 -f alignment.sai data/reference/mm9/chr1.fa.gz data/reads/goby-mouse-reads-sample.compact-reads bwa samse -F goby -f goby-sample data/reference/mm9/chr1.fa.gz alignment.sai data/reads/goby-mouse-reads-sample.compact-reads
In the following we show how alignments can be used to derive wiggle tracks. More advanced analyses are described in specific tutorials.
Now count information can be produced from this alignment (goby 3g is a shortcut for java -Xmx3g -jar goby.jar. the shortcut will work if you have installed Goby and included the distribution directory in your path, see configuration instructions for details).
goby 3g alignment-to-counts goby-sample
This command produces a highly compressed, base-resolution histogram of read coverage:
-rw-r--r-- 1 204K goby-sample.counts
goby 3g counts-to-wiggle goby-sample --resolution 1
gobyweb icb 3.4M goby-sample-all.wig.gz
Try uploading the file produced (i.e., goby-sample-all.wig.gz) to the UCSC Genome Browser. If the file is too large for upload, you may have to reduce the resolution argument to bin counts in larger windows. For instance, with windows of 20 bp:
goby 3g counts-to-wiggle goby-sample --resolution 20
The process illustrated above is the same with an entire genome sequence. While we showed how to produce Goby alignments with BWA, you can also align with GSNAP natively.
Interested in trying Goby? Here is where to go from here:
- Download the software
- Configure on your computer.
- Try this quick demo with the software.
- Take a look at the project tutorials, they discuss how to use Goby for different next-gen data analysis applications, such as genotype calling.
- Familiarize yourself with the various Goby modes (small utilities in the Goby toolbox). Use java -jar goby.jar –help to display a list of modes. Help is context sensitive. Additional information can be found in the reference online manual.
- If you are a programmer interested in using Goby in your own projects, check our Java API pages. Recent versions of Goby APIs are available in Python, C and C++.