This page describes the GSNAP_GOBY plugin, as an example of a plugin that generates Goby alignments.

Config.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<alignerConfig>
    <name>GSNAP (Goby output)</name>
    <id>GSNAP_GOBY</id>
    <dbLegacyId>gsnap (Goby native)</dbLegacyId>
    <help>GSNAP writing to Goby output.</help>
    <supportsColorSpace>false</supportsColorSpace>
    <supportsBisulfiteConvertedReads>true</supportsBisulfiteConvertedReads>
    <supportsGobyReads>true</supportsGobyReads>
    <supportsGobyAlignments>true</supportsGobyAlignments>
    <supportsPairedEndAlignments>true</supportsPairedEndAlignments>
    <supportsFastqReads>false</supportsFastqReads>
    <supportsFastaReads>false</supportsFastaReads>
    <supportsBAMAlignments>false</supportsBAMAlignments>
    <version>1.1</version>
    <requires>
        <resource>
            <id>GSNAP_GOBY</id>
            <version-at-least>2011.10.16</version-at-least>
        </resource>
    </requires>
    <options>
       ...
    </options>
</alignerConfig>

The plugin identifier must match the name of the directory where the plugin is defined. Please note that case matters in this comparison.

This plugin indicates that it requires the GSNAP_GOBY resource with a version number at least 2011.10.16.

The following elements define a grid-parallel plugin that reads Goby read files and writes Goby alignments:

    <supportsGobyReads>true</supportsGobyReads>
    <supportsGobyAlignments>true</supportsGobyAlignments>
    <supportsBAMAlignments>false</supportsBAMAlignments>

When these elements are set in this way, GobyWeb will generate an oracle grid engine array job. Each part of the input sample will be aligned independently on the grid, and results obtained from the plugin will be sorted (in parallel), then combined with the goby concatenate-alignment mode after all the parts of the array job have completed.

The script.sh file must only provide functionality to align one part of the input file.

script.sh

The plugin defines the usual aligner function:

function plugin_align {
...
}

This function first detects if the sample was treated with sodium bisulfite, and if so trims the reads to remove illumina adapters aggressively (since GSNAP does not support color-space, and since RRBS or methyl-seq still require millions of reads, there is no need to trim adapters from other platforms in this case):

     BISULFITE_OPTION=""
     if [ "${BISULFITE_SAMPLE}" == "true" ]; then
         goby reformat-compact-reads  --start-position=${START_POSITION} --end-position=${END_POSITION}  ${READS_FILE} -o small-reads.compact-reads
         dieUponError "reformat reads failed, sub-task ${CURRENT_PART} of ${NUMBER_OF_PARTS}, failed"
         # GSNAP version 2011-03-11 and newer, for older use -C
         BISULFITE_OPTION=" --mode cmet -m 1 -i 100 --terminal-threshold=100    "
         # set the number of threads to the number of cores available on the server:
         NUM_THREADS=`grep physical  /proc/cpuinfo |grep id|wc -l`
         ALIGNER_OPTIONS="${ALIGNER_OPTIONS} -t ${NUM_THREADS}"
         # Trim the reads if they are bisulfite.
         goby trim  -i small-reads.compact-reads -o small-reads-trimmed.compact-reads --complement -a  ${RESOURCES_ILLUMINA_ADAPTERS_FILE_PATH}  --min-left-length 4
         dieUponError "trim reads failed, sub-task ${CURRENT_PART} of ${NUMBER_OF_PARTS}, failed"
         WINDOW_OPTIONS=" "
         READ_FILE_SMALL=small-reads-trimmed.compact-reads
     else
         WINDOW_OPTIONS=" --creads-window-start=${START_POSITION} --creads-window-end=${END_POSITION}  "
         READ_FILE_SMALL=" ${READS_FILE} "
     fi

The important variables used in the previous section are:

  • ${START_POSITION}: byte offset where to start reading into the Goby compact-reads sample file.
  • ${END_POSITION}: byte offset where to stop reading into the Goby compact-reads sample file.
  • ${READS_FILE}: the complete Goby compact-reads sample file.

The script starts by extracting just the reads in between the file offsets, and write a temporary file in the current directory called small-reads.compact-reads. This file is used to trim adapters with the goby trim mode. Finally, the variable READ_FILE_SMALL is set to the trimmed output and WINDOWS_OPTIONS is cleared to force GSNAP to everything from the small-reads file. If the sample is not bisulfite, READ_FILE_SMALL is set to the entire file and window options are propagated so that GSNAP will read directly the relevant slice of the original input file.

The last section of the script.sh file aligns the suitable part of the input file against the reference genome, with appropriate options for single end or paired-end samples:

     if [ "${PAIRED_END_ALIGNMENT}" == "true" ]; then
         # PAIRED END alignment, native aligner
         nice ${RESOURCES_GSNAP_GOBY_EXEC_PATH} ${WINDOW_OPTIONS} -B 4 ${BISULFITE_OPTION} ${ALIGNER_OPTIONS} -n ${AMBIGUITY_THRESHOLD} -A goby --goby-output="${OUTPUT}" -D ${INDEX_DIRECTORY} -d ${INDEX_PREFIX} -o ${PAIRED_END_DIRECTIONS} ${READ_FILE_SMALL}
     else
         # Single end alignment, native aligner
         nice ${RESOURCES_GSNAP_GOBY_EXEC_PATH}  ${WINDOW_OPTIONS} -B 4 ${BISULFITE_OPTION} ${ALIGNER_OPTIONS} -n ${AMBIGUITY_THRESHOLD}  -A goby --goby-output="${OUTPUT}" -D ${INDEX_DIRECTORY} -d ${INDEX_PREFIX} ${READ_FILE_SMALL}
     fi

The variables defined by GobyWeb and used in this section are:

  • ${INDEX_DIRECTORY}: the directory were the GSNAP indexed reference has been stored on the grid node that executes the plugin.
  • ${INDEX_PREFIX}: the name of the database to search.
  • ${ALIGNER_OPTIONS}: options set by GobyWeb, such as option to support spliced alignments, etc. (these options will eventually be integrated into the plugin system.)
  • ${PAIRED_END_DIRECTIONS}: The paired read directions, as specified on the user interface for the sample.
  • ${AMBIGUITY_THRESHOLD}: this option will be obtained from the plugin option mechanism.