Alignment Analysis plugins compare group of Goby or BAM Alignment instances produced by Aligner jobs. To provide alignments as input, we use the tags assigned to them at publication time (see How to check results section for Aligner jobs).
In this tutorial, we execute a differential expression Analysis with EdgeR using a plugin (id=DIFF_EXP_EDGE_R_ARTIFACT) available in the plugins-SDK branch of the Plugin Repository.
To run an analysis plugins, use the command plugins-submit-job. Each analysis has is own set of options, automatically built from the plugin’s configuration file (config.xml). To discover which options are available for a specific analysis, we firstly ask to the SDK to print them. For instance, to see the options for the DIFF_EXP_EDGE_R_ARTIFACT analysis, we execute the command with the following options:
plugins-submit-job \ --plugins-dir PLUGINS_ROOT_LOCATION \ --job DIFF_EXP_EDGE_R_ARTIFACT \ --help
This prints out a list of options along with their description:
- Firstly, there is a set of options common to all plugins (described here).
- A set of arbitrary options that can be passed to the plugin and will be available in the plugin runtime environment. These options are useful for debugging purposes or if we want (for any reason) to overwrite the ones coming from the reads metadata:
[--option] <option> Additional option(s) to pass to the job in the format KEY=VALUE. The option will be available as environment variable in the job execution environment.
- Analyses compare two or more groups of alignments. Each GROUP_DEFINITION value defines a group (with its name) and the list of tags identifying the alignments belonging the group:
--GROUP_DEFINITION <GROUP_DEFINITION> The group definition list. Each definition must be in the format: Group_Name=TAG,TAG342,TAG231,etc. TAGs must match the ones declared in the SLOTS
- Once groups are defined, we need to input how groups are compared. Each COMPARISON_PAIR value defines pair of groups to compare:
--COMPARISON_PAIR <COMPARISON_PAIR> The comparison pair list. Each pair must be in the format: Group_Name1/Group_Name2. Group names must match the ones declared in the GROUP_DEFINITION
- There might be options requesting a value (of the type indicated in the help description). For example:
--FILTERING <FILTERING> Indicate whether low count tags should be filtered. This prevents reporting spurious DE tags in the final result. See edgeR documentation for more details. Option type: BOOLEAN. (default: TRUE)
- There might be options of type CATEGORY, which means that only an enumeration of values is accepted. For example:
[--WEIGHT_ADJUSTMENT <WEIGHT_ADJUSTMENT>] Type of count adjustment. Option Type: CATEGORY. Allowed values [NONE, GC_CONTENT, HEPTAMERS]. (default: NONE)
- A final option that explains the FileSet instances accepted by the analysis and their cardinality:
slots1 slots2 ... slotsN List of input slots for the job. DIFF_EXP_EDGE_R_ARTIFACT accepts the following input slots: - INPUT_ALIGNMENTS (instance of GOBY_ALIGNMENT): minOccurs 1, maxOccurs unbounded The list must be provided in the format: INPUT_ALIGNMENTS: TAG1 ... TAGN
In this case, the analysis plugin accepts Goby Alignments.
Submit the Analysis
For the sake of simplicity, we leave the default values (indicated by the help output) for the optional parameters.
In the example below, we define and compare two groups of alignments by submitting the analysis job as follows:
plugins-submit-job \ --plugins-dir PLUGINS_ROOT_LOCATION \ --job DIFF_EXP_EDGE_R_ARTIFACT \ --job-area USER@SERVER:/zenodotus/dat01/campagne_lab_scratch/gobyweb/GOBYWEB_TRIAL/SGE_JOBS \ --fileset-area /zenodotus/dat01/campagne_lab_store/gobyweb_dat/GOBYWEB_TRIAL/FILESETS_AREA \ --owner instructor \ --queue rascals.q \ --COMPARISON_PAIR Group_1/Group_2 \ --GROUP_DEFINITION Group_1=ZDFTZZE,PVOVHCB \ --GROUP_DEFINITION Group_2=KAKIMJE,QNLVWEK \ --artifact-server LOCAL_USER@LOCAL_HOSTNAME \ --ESTIMATE_COUNTS_EXON true \ --ESTIMATE_COUNTS_OTHER false \ --repository /scratchLocal/gobyweb/ARTIFACT_REPOSITORY-PLUGINS-SDK \ INPUT_ALIGNMENTS: KAKIMJE ZDFTZZE PVOVHCB QNLVWEK
If everything goes well, the submission ends by printing out the location in which the analysis will be executed:
2013-06-28 16:20:44,744 INFO - Loading available plugins... 2013-06-28 16:20:46,607 INFO - Analyzing the submission request... 2013-06-28 16:20:52,161 INFO - Collecting files from dependencies... 2013-06-28 16:20:53,349 INFO - Generating the job environment... 2013-06-28 16:20:53,915 INFO - Submitting files for execution... 2013-06-28 16:20:56,186 INFO - Requesting job execution... 2013-06-28 16:20:56,186 INFO - Output from the submission process: //some output... 2013-06-28 16:21:01,438 INFO - The job will be executed in the Job Area at USER@SERVER:/zenodotus/dat01/campagne_lab_scratch/gobyweb/GOBYWEB_TRIAL/SGE_JOBS/instructor/ILBXKDP
In case the command ends with an error, check the Troubleshooting page where common issues are described with suggested solutions.
How to check Results
Alignment analyses are executed asynchronously and their results are persisted in the FileSet area as FileSet instances. The outcomes of an analysis job vary depending on the plugin’s own logic and they are defined in the OutputSchema declared in its config.xml. The Plugins SDK publishes a FileSet instance for each OutputFile declared there.
To check which FileSets are published by a submitted analysis, we need to logon on the OGE node hosting the Job Area and look at the log files in the job execution folder.
#logon on the OGE node ssh USER@SERVER #change dir in the fileset area root folder cd /zenodotus/dat01/campagne_lab_scratch/gobyweb/GOBYWEB_TRIAL/SGE_JOBS/instructor/ILBXKDP
In the folder, we find three types of log files:
- ILBXKDP.submit.* : a single file reporting the initial submission activity;
- ILBXKDP.aap.* : analysis are split in sub-tasks (according to the number of groups and/or input alignments provided), each of them executed independently. The activity of each sub-task is reported in a separated file of this type;
- ILBXKDP.post.* : a single file reporting the activity of the aligner after all the sub-tasks are completed. If this file is not in the folder, the job is not completed or it ended with an error.
Analysis results are published in the post phase, therefore we need to look inside the RGDHGGV.post.* to find out them. To do so, execute the following command:
grep "has been successfully registered with tag" ILBXKDP.post.*
This prints out something like:
+ echo 'stats.lucene.index has been successfully registered with tag BVIHDTV' stats.lucene.index has been successfully registered with tag BVIHDTV + echo 'stats.tsv has been successfully registered with tag VHGTKPC' stats.tsv has been successfully registered with tag VHGTKPC
In this example, the execution of DIFF_EXP_EDGE_R_ARTIFACT analysis produced two FileSet instances, a Lucene Index and a Tab-separated value file with some statistics.
We can then go in the FileSet area and look for each instance’s content with its tag. For example, to check the Lucene Index:
cd /zenodotus/dat01/campagne_lab_store/gobyweb_dat/GOBYWEB_TRIAL/FILESETS_AREA cd instructor cd BVIHDTV ls -lrt total 32 drwxr-xr-x 2 gobyweb icb 32768 Jun 28 16:28 ILBXKDP-stats.lucene.index ls -lrt ILBXKDP-stats.lucene.index/ total 896 -rw-r--r-- 1 gobyweb icb 14924 Jun 28 16:28 _0.fdx -rw-r--r-- 1 gobyweb icb 147467 Jun 28 16:28 _0.fdt -rw-r--r-- 1 gobyweb icb 247 Jun 28 16:28 segments_1 -rw-r--r-- 1 gobyweb icb 20 Jun 28 16:28 segments.gen -rw-r--r-- 1 gobyweb icb 936 Jun 28 16:28 index.metadata.txt -rw-r--r-- 1 gobyweb icb 247620 Jun 28 16:28 _0.tis -rw-r--r-- 1 gobyweb icb 3239 Jun 28 16:28 _0.tii -rw-r--r-- 1 gobyweb icb 3730 Jun 28 16:28 _0.prx -rw-r--r-- 1 gobyweb icb 4 Jun 28 16:28 _0.nrm -rw-r--r-- 1 gobyweb icb 229183 Jun 28 16:28 _0.frq -rw-r--r-- 1 gobyweb icb 109 Jun 28 16:28 _0.fnm
The next public release of the SDK will offer facilities to query/browse the FileSets produced by a job.