Next-gen sample alignment and analysis is computationally intensive. GobyWeb is designed to utilize more than one machine to increase alignment and analysis throughput. Figure 1 shows the architecture GobyWeb is designed to use. A single front-end Java Application Server, such as Tomcat, runs the GobyWeb web application. The backend is comprised of a single database (we have tested with Oracle) and multiple servers (compute nodes) running in an Sun Grid Engine (SGE) cluster.
Hardware requirements, database server
- No specific hardware requirements.
Hardware requirements, web server and web application server(s)
- A Linux-based server running a Java application server. We use Apache and Tomcat. In our setup, Apache receives all web traffic and uses mod_jk to direct traffic to the appropriate Tomcat instance. Tomcat is capable as acting as a stand-alone web server, so Apache is an optional component. We recommend at one application server with 4+ cores and at least 8GB of memory. The web app server should have an associated filesystem to store:
- Samples, as they are being uploaded but before moving to the cluster’s shared filesystem.
- Results of alignments and alignment analyses.
Hardware requirements, SGE compute nodes
- One or more Linux-based SGE compute nodes and an associated SGE queue. Alignments and alignment analyses are computationally and memory intensive tasks, so we recommend multiple SGE compute nodes, each with 4+ cores and at least 32GB of memory. The SGE compute nodes should have the following filesystems:
- A filesystem shared across all compute nodes (such as via NFS mount) to store uploaded samples.
- Filesystems, preferably local to each compute node (for speed), containing the references indexed with each supported aligner and various per-reference support files. These filesystems local to each compute node are where most per-job execution occurs to increase speed and decrease network traffic.
Software requirements, database server
- The database stores all meta-data for GobyWeb objects (Samples, Alignment Jobs, Alignments, etc.). As all GobyWeb item meta-data is stored in the database, we strongly urge you to have a verified backup and restore scheme in place. We have tested running GobyWeb with Oracle 10.2. Although untested, other database servers, such as MySql, would probably also work.
Software requirements and accounts, web app server
- The bash shell installed to /bin/bash.
- A user account. We recommond using the name gobyweb.
- The user’s (gobyweb) default shell should be bash.
- The user’s home directory (gobyweb) on the web server should not be the same as the user’s home directories for the SGE compute nodes.
- Tomcat (we are running 7.0.22).
- The normal complement of Linux/UNIX utilities (sed, cut, wget, etc.).
- Java 1.6, preferably a recent version.
The environment variable JAVA_HOME pointing to your installation of Java.
$JAVA_HOME/bin in your PATH.
- Groovy 1.8+, preferably the most recent version
The environment variable GROOVY_HOME pointing to your installation of Groovy.
$GROOVY_HOME/bin in your PATH
Software requirements and accounts, SGE compute nodes
- With the exception of Tomcat, all of above requirements for web application server should be met on each of the compute nodes.
- The user’s home directories (gobyweb) on all of the SGE compute nodes should be shared.
- Sun Grid Engine (SGE, we are running SGE 6.2u5).
- The “R” programming language (we are using 2.12.2).
- OPTIONAL: cURL. Version 7.22 or later. This can be used during index creation to assist with downloading annotation files from MartService.
Firewall and SSH considerations
- The web application server needs the ability to communicate with the database server.
- The compute nodes need to be able to communicate with the web server via http on the same port / urls that users use to access GobyWeb.
- The compute nodes and the web application server need to be able to communicate bidirectionally, without password. This includes but may not be limited to ssh, scp, and rsync using the standard ssh port (22).
Disk Space Requirements
Next-gen alignment and analysis are computationally, memory, and storage intensive. The various filesystems, outlined above, need to have a fair bit of space. It is not uncommon for a single sample (reads file) to be larger than a gigabyte and for the resultant alignment to be over 150 megabytes. Additionally, each experiment often requires multiple samples. The required disc space can grow rapidly. GobyWeb provides the capability to “fanout” the reads and results data across multiple volumes, should you require more storage than is available on a single volume.
As the database is only used to store the meta-data about samples, alignments, etc. the amount of space required by the database is much more modest. An tablespace on the order of 500 megabytes should be more than sufficient for a great number of samples, alignments, and analyses.