From the registration, you get back the tags assigned to the FileSet instances. These tags can be used to run any of the plugin aligners available in the plugins-SDK branch of the Plugin Repository (or your own plugin, if you develop a new one). In this tutorial, we execute the Star 2.3 aligner with the support for artifacts installation (id=STAR22_GOBY).

Plugins Command

To run an aligner plugins, use the command plugins-submit-job. Each aligner has is own set of options, automatically built from the plugin’s configuration file (config.xml). To discover which options are available for a specific aligner, we ask to the SDK to print them. For instance, to see the options for the STAR22_GOBY aligner, we execute the command with the following options:

plugins-submit-job \
--plugins-dir PLUGINS_ROOT_LOCATION \
--job STAR22_GOBY \
--help

This prints out a list of options along with their description:

  • Firstly, there is a set of options common to all plugins (described here).
  • A set of arbitrary options that can be passed to the plugin and will be available in the plugin runtime environment. These options are useful for debugging purposes or if we want (for any reason) to overwrite the ones coming from the reads metadata:
      [--option] <option>
            Additional option(s) to pass to the job in the format KEY=VALUE. The option will be available as
            environment variable in the job execution environment.
  • The reference genome that is used for the alignment:
      --GENOME_REFERENCE_ID <GENOME_REFERENCE_ID>
            The reference genome.
  • To scale and improve performances, the input reads file is split in chunks independently aligned by sub-tasks executed in parallel. The CHUNK_SIZE option determines the parallelism of the alignment process (the biggest is the size, the lesser parallelism will be used):
      --CHUNK_SIZE <CHUNK_SIZE>
            The number of bytes of compressed reads file to give to a single align
            part. (default: 50000000)
  • A set of options coming from the plugins configuration. Each option has a type which restricts the values that can be assigned to the option. There exist also options of type CATEGORY accepting only values enumerated in the option description:
      --AMBIGUITY_THRESHOLD <AMBIGUITY_THRESHOLD>
            The maximum number of reference sequence locations that can be matched
            for a read to be considered
            non-ambiguous. Please note that STAR currently discards/does not output
            alignments found to strictly
            match more than the specified ambiguity threshold. Option Type: INTEGER.
            (default: 10)
      [--ALIGNER_OPTIONS <ALIGNER_OPTIONS>]
            Provide any additional STAR option here following the syntax expected by
            STAR. Option Type: STRING. (default: )
  • A final option that explains the FileSet instances accepted by the aligner and their cardinality:
      slots1 slots2 ... slotsN
            List of input slots for the job. STAR22_GOBY accepts the following input
            slots:
            - INPUT_READS (instance of COMPACT_READS): minOccurs 1, maxOccurs 1
            The list must be provided in the format:
            INPUT_READS: TAG

    All the aligners available in the Plugin Repository accepts exactly one instance of the COMPACT_READS FileSet.

Submit the Aligner

For the sake of simplicity, we leave the default values (indicated by the help output) for the optional parameters.
We align an uploaded reads file by submitting the aligner as job with the tag assigned to the reads file as follows:

plugins-submit-job \
--job-area USER@SERVER:/zenodotus/dat01/campagne_lab_scratch/gobyweb/GOBYWEB_TRIAL/SGE_JOBS \
--fileset-area /zenodotus/dat01/campagne_lab_store/gobyweb_dat/GOBYWEB_TRIAL/FILESETS_AREA \
--plugins-dir PLUGINS_ROOT_LOCATION \
--queue rascals.q \
--GENOME_REFERENCE_ID WBcel215.69 \
--artifact-server LOCAL_USER@LOCAL_HOSTNAME \
--repository /scratchLocal/gobyweb/ARTIFACT_REPOSITORY-PLUGINS-SDK \
--job STAR22_GOBY \
--owner instructor \
INPUT_READS: DUSESWO

If everything goes well, the submission ends by printing out the location in which the aligner will be executed:

2013-06-28 10:12:18,311 INFO - Loading available plugins...
2013-06-28 10:12:20,232 INFO - Analyzing the submission request...
2013-06-28 10:12:22,484 INFO - Collecting files from dependencies...
2013-06-28 10:12:23,463 INFO - Generating the job environment...
2013-06-28 10:12:23,675 INFO - Submitting files for execution...
2013-06-28 10:12:25,825 INFO - Requesting job execution...
2013-06-28 10:12:25,825 INFO - Output from the submission process:
//some output...
2013-06-28 10:12:30,972 INFO - The job will be executed in the Job Area at USER@SERVER:/zenodotus/dat01/campagne_lab_scratch/gobyweb/GOBYWEB_TRIAL/SGE_JOBS/instructor/RGDHGGV

In case the command ends with an error, check the Troubleshooting page where common issues are described with suggested solutions.

How to check Results

Aligners are executed asynchronously and their results are persisted in the FileSet area as FileSet instances.
The main outcome of an aligner is a Goby or BAM alignment (depending on the plugin’s own logic). Some aligners also produce statistics on the alignment, bedgraph files and counts.

To check which FileSets are published by a submitted aligner, we need to logon on the OGE node hosting the Job Area and look at the log files in the job execution folder.

#logon on the OGE node
ssh USER@SERVER
#change dir in the fileset area root folder
cd /zenodotus/dat01/campagne_lab_scratch/gobyweb/GOBYWEB_TRIAL/SGE_JOBS/instructor/RGDHGGV

In the folder, we find three types of log files:

  1. RGDHGGV.submit.* : a single file reporting the initial submission activity;
  2. RGDHGGV.align.* : aligners are split in sub-tasks (according to the CHUNK_SIZE option’s value), each of them executed independently. The activity of each sub-task is reported in a separated file of this type;
  3. RGDHGGV.post.* : a single file reporting the activity of the aligner after all the sub-tasks are completed. If this file is not in the folder, the job is not completed or it ended with an error.

Aligner results are published in the post phase, therefore we need to look inside the RGDHGGV.post.* to find out them. To do so, execute the following command:

grep "has been successfully registered with tag" RGDHGGV.post.*

This prints out something like:

+ echo 'The following GOBY_ALIGNMENT instance has been successfully registered: VGYPVXD'
The following GOBY_ALIGNMENT instance has been successfully registered: VGYPVXD
+ echo 'The following TSV instance has been successfully registered: CRVCXFA'
The following TSV instance has been successfully registered: CRVCXFA
+ echo 'The following COUNTS instance has been successfully registered: MMYVNWT'
The following COUNTS instance has been successfully registered: MMYVNWT
+ echo 'The following GZ instance has been successfully registered: '
The following GZ instance has been successfully registered:
+ echo 'The following STATS instance has been successfully registered: EYFMIBY'
The following STATS instance has been successfully registered: EYFMIBY

We can then go in the FileSet area and look for each instance’s content with its tag. For example, to check the Goby Alignment:

cd /zenodotus/dat01/campagne_lab_store/gobyweb_dat/GOBYWEB_TRIAL/FILESETS_AREA
cd instructor
cd VGYPVXD
ls -lrt
total 5152
-rw-r--r-- 1 gobyweb icb     375 Jun 28 11:11 demopaper-combined-NA19143.index
-rw-r--r-- 1 gobyweb icb     179 Jun 28 11:11 demopaper-combined-NA19143.header
-rw-r--r-- 1 gobyweb icb 5143581 Jun 28 11:11 demopaper-combined-NA19143.entries
-rw-r--r-- 1 gobyweb icb    1714 Jun 28 11:11 demopaper-combined-NA19143.alignment-stats.txt
-rw-r--r-- 1 gobyweb icb      22 Jun 28 11:11 demopaper-combined-NA19143.tmh

Such tags can be then used as value of the input slots for Alignment Analysis or Task jobs that take them as input.

The next public release of the SDK will offer facilities to query/browse the FileSets produced by a job.