GobyWeb is a web site to aid with next-gen data analysis. In the back-end, it uses the Goby framework, an open source next-gen data analysis framework along with a cluster of servers to provide rapid alignment and the tools necessary to enable the user to analyze their next-gen sequencing data.

The following figure presents an overview of GobyWeb, with an emphasis on the flow of data within the application, from read upload to output.

Flow of data in and out of the GobyWeb application. Inputs are shown in yellow, outputs in blue. Data types for which GobyWeb stores meta-data are shown in green. Long running-processes are shown in pink.

Sample File Formats

GobyWeb can directly import reads in the following file formats:

  • Goby compact reads format, extension .compact-reads (always compressed)
  • Fastq, extensions .fq.gz, .fastq.gz, .txt.gz, (.fastq, .fq)
  • Fasta, extensions .fa.gz, .fasta.gz, .txt.gz, (.fasta, .fa)
  • Colorspace fasta, extensions .csfasta.gz, (.csfasta)

GobyWeb calls a collection of reads a Sample. Several samples typically are uploaded to the application, aligned and analyzed by comparing samples that belong to different biological or clinical groups. In this tutorial, we use the terms Sample and collection of reads interchangeably.

GobyWeb 1.1+ will automatically convert supported sample formats to Goby compact formats. The Goby format makes parallel alignments efficient and is used internally by GobyWeb. It is easy to convert any other file format to the compact-reads format by writing conversion programs with the Goby API (see ReadsWriter). However, if you have input files in the four formats listed above, no programming is necessary and files can be uploaded directly. See the read upload training video to see how.

Once you have learned how to upload reads, we suggest you review how to align samples.