Setting up the environment

Cluster-side Installation

Before progressing, please make sure the hardware, software, and configuration requirements have been met.

We will call the collection of SGE compute nodes, the “cluster”.

Extract the gobyweb binary distribution to the home directory of the gobyweb account on one of the cluster nodes. This will make the following files and directories

~gobyweb/
     gobyweb-VERSION-webapp.tar.gz    [this will be moved to the webserver]
     goby/
          goby.jar
          log4j.properties
          goby.properties
          *.R
          *.groovy
          *.jsap
          lib/
               sqlite-jdbc-3.7.2.jar
          nextgen-tools/
               [GSNAP, BWA, SAMTOOLS, etc. will go here]
     index-creation/
          reference-db/
               [this is where indexed references will go]
          create-indexes.sh
          Biomart.groovy
     goby-2.1.1-cpp.tgz

Next, in the gobyweb home directory on the cluster make the following additional directories.

  • mkdir ~/GOBYWEB_SGE_JOBS
  • mkdir ~/GOBYWEB_FILES

Installing the Goby and SQLite JDB Driver Jars for Groovy

The distribution contains the SQLite JDBC Driver sqlite-jdbc-3.7.2.jar. This should be installed for use with Groovy using the following commands

  • mkdir -p ~/.groovy/lib
  • ln -s ~/goby/lib/sqlite-jdbc-3.7.2.jar ~/.groovy/lib
  • ln -s ~/goby/goby.jar ~/.groovy/lib

Building and installing the goby-2.1.1-cpp Library

The goby-2.1.1-cpp.zip file is included in distribution and is required to build aligners that have native Goby file format support. To build this library, use the following steps:

Unpack the library tar file

  • cd ~
  • unzip goby-2.1.1-cpp.zip
  • cd goby_2.1.1/cpp

Follow the instructions found in ~/goby_2.1.1/cpp/README.txt

Please note that we may have released a more recent version of Goby than 2.1.1 at the time you are installing. We do recommend that you start by downloading goby_2.1.1 because this version has been well tested with GobyWeb 1.9.1. Once you have determined that the installation works well, you can try the latest release of Goby.

Configuring R

This installation guide assumes you already have R installed on the cluster nodes. GobyWeb requires some additional R packages and configuration. Add the following variables to ~gobyweb/.bash_profile (these assume you are running an R 2.12 version)

export R_HOME=`R RHOME | /bin/grep --invert-match WARNING`
export R_LIB1=${R_HOME}/lib
export R_LIB2=${HOME}/R/x86_64-unknown-linux-gnu-library/2.12/rJava/jri
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${R_LIB1}:${R_LIB2}

You will logout then login again for these environment variables to take effect.

Next, you need to install several R packages. Execute the following commands (the first command is to start R, the subsequent commands are R commands). It should be noted that the first time you execute install.packages below, it will ask about creating a local R library directory. In our case this was in ${HOME}/R/x86_64-unknown-linux-gnu-library/2.12, as seen above in R_LIB2. If the directory it uses is different in your installation, you should change R_LIB2 above to reflect the actual directory.

  • cd ~
  • R
  • install.packages(‘Rserve’,,’http://www.rforge.net/’)
  • install.packages(‘ROCR’, dependencies=TRUE)
  • source(“http://bioconductor.org/biocLite.R”)
  • biocLite()
  • biocLite(“DESeq”)
  • install.packages(“Cairo”, dependencies=TRUE)
  • install.packages(“rJava”, dependencies=TRUE)

Installing samtools

The following commands should be executed to install samtools:

  • cd ~
  • wget \
    http://downloads.sourceforge.net/project/samtools/samtools/0.1.14/samtools-0.1.14.tar.bz2
  • tar jxvf samtools-0.1.14.tar.bz2
  • cd samtools-0.1.14
  • make
  • mkdir –p ${HOME}/goby/nextgen-tools/samtools/
  • cp samtools ${HOME}/goby/nextgen-tools/samtools/
  • cp bcftools/bcftools ${HOME}/goby/nextgen-tools/samtools/

Installing tabix

The following commands should be executed to install tabix:

  • cd ~
  • wget \
    http://downloads.sourceforge.net/project/samtools/tabix/tabix-0.2.3.tar.bz2
  • tar jxvf tabix-0.2.3.tar.bz2
  • cd tabix-0.2.3
  • make
  • mkdir –p ${HOME}/goby/nextgen-tools/tabix/
  • cp bgzip ${HOME}/goby/nextgen-tools/tabix/
  • cp tabix ${HOME}/goby/nextgen-tools/tabix/

Installing vcftools

The following commands should be executed to install vcftools:

  • cd ~
  • wget \
    http://downloads.sourceforge.net/project/vcftools/vcftools_v0.1.4a.tar.gz
  • tar zxvf vcftools_v0.1.4a.tar.gz
  • cd vcftools_0.1.4a
  • make
  • mkdir –p ${HOME}/goby/nextgen-tools/vcftools/
  • cp perl/* ${HOME}/goby/nextgen-tools/vcftools/
  • cp cpp/vcftools ${HOME}/goby/nextgen-tools/vcftools/

Edit your ~gobyweb/.bash_profile to include the following lines

# the followign is required by VCFTOOLS perl scripts:
export PERL5LIB=${PERL5LIB}:${HOME}/goby/nextgen-tools/vcftools/

Building the aligners

Building and installing the BWA aligner with Goby support

GobyWeb depends on a version of BWA that has been modified to directly support Goby file formats (bwa-icb). The following commands should be executed to install bwa-icb:

  • cd ~
  • wget \
    http://campagnelab.org/files/20111230-bwa-0.6.1-r104-icb.tgz
  • tar zxvf 20111230-bwa-0.6.1-r104-icb.tgz
  • cd bwa-icb
  • chmod +x autogen.sh
  • ./configure –with-goby
  • make
  • mkdir -p ${HOME}/goby/nextgen-tools/bwa/
  • ##
  • ## Note: when we copy the executble, we
  • ## rename it to bwa-icb
  • ##
  • cp bwa ${HOME}/goby/nextgen-tools/bwa/bwa-icb

If, during compilation, you receive an error similar to

Supported emulations: elf_x86_64 elf_i386 i386linux
collect2: ld returned 1 exit status

you should edit the Makefile and look for the line goby_LIBS and change the
“-m” to “-mt” then run make again

Building and installing the GSNAP aligner with Goby support

The current version of GMAP/GSNAP (version 2012-02-18 as of this writing) supports Goby file formats. The instructions for building and installing GSNAP are

  • ##
  • ## Obtain gmap-2012-02-18 or later from the author at
  • ## http://research-pub.gene.com/gmap/
  • ##
  • tar zxvf gmap-gsnap-2012-02-18.tar.gz
  • cd gmap-2012-02-18
  • ./configure –with-goby=${LOCAL_LIB}
  • make
  • mkdir –p ${HOME}/goby/nextgen-tools/gsnap/
  • ##
  • ## Note: when we copy the executable, we
  • ## rename it to gsnap-icb
  • ##
  • cp src/gsnap ${HOME}/goby/nextgen-tools/gsnap/gsnap-icb
  • ##
  • ## These additional files aren’t used during alignment
  • ## but are needed for index creation.
  • ##
  • cp src/cmetindex \
    src/gmap \
    src/gmapindex \
    src/iit_store \
    util/dbsnp_iit \
    util/fa_coords \
    util/gmap_build \
    util/gmap_process \
    util/gmap_setup \
    util/gmap_compress \
    util/gmap_reassemble \
    util/gmap_uncompress \
    util/md_coords \
    util/psl_genes \
    util/psl_introns \
    util/psl_splicesites ${HOME}/goby/nextgen-tools/gsnap/

Creating reference indexes

One of GobyWeb’s primary functions is to align fasta/fastq samples to a reference. In order to do this, it is necessary to index references with one or more aligners. To make alignments run faster, we recommend you store these indexed references on the filesystems that are local to each SGE compute node (as opposed to a network shared filesystem). We provide scripts to assist with index creation. These scripts are found in ~gobyweb/index-creation/.

First, obtain the fasta reference you want to align against. References often come from ftp://ftp.ensembl.org/pub. For demonstration purposes, this guide will index the NCBI Mouse reference version 37.64.

Download the single “toplevel” “dna” fa.gz reference file and associated “cdna” “all” fa.gz file.

  • cd ~gobyweb/index-creation/
  • wget \
    ftp://ftp.ensembl.org/pub/release-64/fasta/mus_musculus/dna/Mus_musculus.NCBIM37.64.dna.toplevel.fa.gz
  • wget \
    ftp://ftp.ensembl.org/pub/release-64/fasta/mus_musculus/cdna/Mus_musculus.NCBIM37.64.cdna.all.fa.gz

It is necessary to edit the create-indexes.sh file to reflect the details of the reference that is going to be indexed. Near the top of this create-indexes.sh you will find a number of example configurations. For this index, the single uncommented configuration should be:

INPUT_FASTA=Mus_musculus.NCBIM37.64.dna.toplevel.fa.gz
INPUT_CDNA_FASTA=Mus_musculus.NCBIM37.64.cdna.all.fa.gz
ORGANISM=mus_musculus
VERSION=NCBIM37.64

Also in create-indexes.sh, locate the definition of the variables GSNAPDIR, BWADIR, and SAMTOOLS_DIR. It should not be necessary to changes these, but they should be verified.

Finally, below find the variables ALIGNERS and SPACES. These define which aligners will be used when creating indexes and if “basespace” and/or “colorspace” indexes should be created (not every aligner supports both).

OPTIONAL STEP: cURL can optionally be used by the script Biomart.groovy. By default, cURL isn’t used, but if you have a relatively recent version of cURL (7.22 or later, which can be checked with the “curl –version” command) you can enable downloading using cURL. The benefit of using cURL for is that it gives a more visual indication to the status of the file transfers. To enable this, edit Biomart.groovy and change the downloadWithCurl option

boolean downloadWithCurl = true

Once you have configured create-indexes.sh, execute it.

  • cd ~gobyweb/index-creation/
  • ./create-indexes.sh

Index creation can be a time consuming process (easily taking hours if you are creating indexes for multiple aligners). During the process of creating indexes, a few annotation files will be fetched from BioMart. The process of downloading annotations can also take a considerable amount of time.

Once the indexes have been created and the annotations have been downloaded, if, for example, you aligned for basespace with the BWA aligner, you would have the following directory structure

~gobyweb/
   index-creation/
     reference-db/
       NCBIM37.64/
         mus_musculus/
           reference/    [original reference, annotation files, etc.]
           basespace/
              bwa/
                 index*    [the actual bwa index files]

Once you have built the references you need, you’ll want to install them to the compute nodes. GobyWeb’s Alignment plugins, which are discussed later, allow you to specify the location of the references for each aligner. By default, the alignment plugins will expect to find the indexed references on the SGE cluster nodes in the exact directory structure shown here:

/scratchLocal/
   gobyweb/
      input-data/
         reference-db/
            VERSION/ # NCBIM37.64 in this example
               ORGANISM/ # mus_musculus in this example
                  reference/
                     basespace|colorspace/
                        ALIGNER_NAME/
                           index*

You can ignore the default organization and modify the appropriate configuration in each plugin. The parts of the path in bold will be filled in by GobyWeb and need to exist on disk for each combination of VERSION, ORGANISM and ALIGNER_NAME defined in the configuration. See the index creation script for a list of values that these variables can take.

Assuming the default configuration above, copy, mirror, or rsync the files in ~gobyweb/index-creation/reference-db/ to each of the cluter compute nodes to /scratchLocal/gobyweb/input-data/reference-db/

It might be temping to store these references on a single shared network resource. Resist the idea as this configuration reduces alignment performance considerably as each job needs to wait for scarce network resources before starting alignment.

Setting up and configuring the GobyWeb web application

Extract the web application, make directories

Move the “gobyweb-VERSION-webapp.tar.gz” file from the distribution to the ~gobyweb/ directory on the web server, extract its contents, and make required webserver-side directories using the commands:

  • cd ~
  • tar zxvf gobyweb-VERSION-webapp.tar.gz
  • mkdir GOBYWEB_FILES
  • mkdir GOBYWEB_RESULTS
  • mkdir GOBYWEB_UPLOADS

The files in the gobyweb account should now be similar to

~gobyweb/
     GOBYWEB_FILES/
     GOBYWEB_RESULTS/
     GOBYWEB_UPLOADS/
     webapp/
          tomcat-cmd
          temp/
          conf/
              Config.groovy-SAMPLE
              DataSource.groovy-SAMPLE
              server.xml
              tomcat-users.xml
              web.xml
              plugins/
                 [GobyWeb provided/sample plugin directories and files here]
              schemas/
                    plugins.xsd
          webapps/
               ROOT/
                    index.thml
               gobyweb.war
          logs/

Installing Tomcat

Download the latest version of Apache Tomcat 7.0.x from http://tomcat.apache.org/download-70.cgi to the gobyweb home directory on the web server then extract its contents:

  • cd ~
  • tar zxvf apache-tomcat-7.0.xx.tar.gz

To make upgrading to new versions of Apache Tomcat easier, GobyWeb does not use the “webapps” or “conf” directories within the ~gobyweb/apache-tomcat directory. Instead, GobyWeb uses ~gobyweb/webapp/webapps/ and ~gobyweb/webapp/conf/.

Configure GobyWeb

The GobyWeb configuration files are:

~gobyweb/webapp/conf/Config.groovy
~gobyweb/webapp/conf/DataSource.groovy

For both of these configuration files, we’ve provided “-SAMPLE” versions that contains documentation describing the configuration options. Both of these files are written in Groovy. Groovy is a JVM-based language that is very similar to Java. You can learn more about Groovy at

http://groovy.codehaus.org/
http://groovy.codehaus.org/Beginners+Tutorial

but you shouldn’t need any Groovy experience to edit these files if you follow the syntax.

Config.groovy configures application options, directories, etc. DataSource.groovy configures how to connect to your database.

If you intend to use a database server other than Oracle, you will need to install that database servers jdbc .jar file. This jar file should be placed in

~gobyweb/webapp/lib/

Configure Tomcat

To configure the instance of Apache Tomcat, you will need to edit the files

~gobyweb/webapp/conf/server.xml
~gobyweb/webapp/conf/tomcat-users.xml

The server.xml that comes with the GobyWeb distribution specifies that GobyWeb will run on port 8106. Change this port to any port number suitable for your local network, such as the standard port 80 that is used by most web servers.

As far as GobyWeb is concerned, there is very little that needs to be different from a “stock” Tomcat server.xml file, so you may prefer to use the server.xml that comes from the Tomcat distribution. One possible exception is the line

<Context path=”” docBase=”/home/gobyweb/webapp/webapps/ROOT”
reloadable=”true” />

This line specifies the location of this instances ROOT directory, which in the GobyWeb distribution, contains the single file index.html. This file ensures that if a user visits

http://your_server/

they will be redirected to the running GobyWeb application at

http://your_server/gobyweb

The last file you need to edit is ~/webapp/tomcat-cmd. Look for the line

CATALINA_HOME=${HOME}/apache-tomcat-7.0.32

and edit it so it points to the directory where you installed Tomcat.

GobyWeb Plugins

Starting with version 1.7, GobyWeb supports a plugin architecture. The main benefit of plugins are that they enable you to support additional aligners and ananysis packages without having to edit the GobyWeb application. Additionally, the plugin architecture also allows you to add or edit plugins without having to completely restart the GobyWeb application.

GobyWeb supports three kinds of plugins: aligners, analyses, and resources. Aligner plugins provide the ability to align a sample to a reference. Analysis plugins provide the ability to perform analysis on alignments or reads files. Resource plugins are files or executables that are used by the aligner or analysis plugins.

We’ve provided several sample plugins. These can be found in

~gobyweb/webapp/conf/plugins
aligners/
analyses/
resources/

Each plugin (or each version of each plugin) lives in it’s own directory. This directory will at least the file config.xml and possibly other files such as script.sh. If you intend on editing existing plugins or adding new plugins, you should review the existing plugins. To assist with editing config.xml files, we provide the XML schema.

~gobyweb/webapp/conf/schemas/plugins.xsd

As delivered, the resource plugins are incomplete. We have not delievered executables such as GSNAP, BWA, Samtools, and Last. You will need to compile these for your own system (and should have already done so above). Also noteworthy is that the file adapters.txt is missing from the resource plugin named ILLUMINA_ADAPTERS. Plugin source code and updates can be obtained from GitHub. If you develop a new GobyWeb plugin, please consider contributing this plugin to the community (fork the GitHub repository, commit your plugin there and make a pull request to let us know you would like to contribute new code). The Git repository also contains executables pre-built for our system. Please note that we cannot guarantee that these executables will function correctly on different servers.

Obtaining the adapters.txt file

The adapters.txt file contains sequences for paired-end adapters used in Illumina sequencing protocols. You can obtain these sequences from support@illumina.com and put them one sequence per line in a text format. If sequences contain Xs, expand each possibility to represent a fully specified sequence. For instance, expand AXT to AAT, ACT,ATT,AGT and put each possibility on one line.

Once you have obtained this file, it is necessary to install it into the ILLUMINA_ADAPTERS plugin

  • cp adapters.txt \
    ~gobyweb/webapp/conf/plugins/resources/ILLUMINA_ADAPTERS/

Installing the aligner executables

In an earlier step, you compiled and installed the bwa-icb and gsnap-icb aligners on cluster. You will now need to copy these executables into the appropriate resource plugins.

${HOME}/webapp/conf/plugins/resources/BWA_GOBY_*
${HOME}/webapp/conf/plugins/resources/GSNAP_GOBY_*

Additionally, copy the samtools executable you previously compiled on the cluster to the samtools plugin.

${HOME}/webapp/conf/plugins/resources/SAMTOOLS_*

Starting GobyWeb

To start GobyWeb, execute the following

  • cd ~/webapp
  • ./tomcat_cmd start

You can monitor status and activity of GobyWeb monitoring the Tomcat log

  • tail -f ~gobyweb/webapp/logs/catalina.out

This log file will contain important error messages when the configuration of the plugins cannot be validated. In case of errors detected in the plugin configuration, the application will not start, making it easier to diagnose and solve the problem early.

Logging Into GobyWeb

The first time you run a new instance of GobyWeb, an “administrator” account will be created. The default login for this account is

Username: admin
Password: default_password

It is recommended that you change the default password immediately (you can do this from the Account tab in the deployed web application).

Congratulations on your installation. Take a look at our video tutorials to learn how to use the installation application.

Installing/changing plugins on the fly

After initial installation, you may find that you need to install a new plugin, or make a quick fix to an already installed plugin. Changing a plugin or installing a new plugin does not require an application restart. Simply adjust directories and files in the plugins directory, navigate to the About page of the running application. Users with the administrator role will see a button labeled “Reload plugin definitions” below the list of currently loaded plugins. Pressing this button will reload the plugins from disk and validate them again. Validation errors will be shown in the catalina log file, and plugins with errors will be disabled (GobyWeb continues to run if a plugin has errors when modified at runtime). Check that the plugin was loaded successfully by looking for the plugin name and version number in the list shown on the About page.

Stopping GobyWeb

To stop GobyWeb, execute the following

  • cd ~gobyweb/webapp
  • ./tomcat_cmd stop