Locate a FileSet Area

Before to start registering compact-reads files, we need to locate a storage area on a compute node (OGE node) where files will be placed. This area is called FILESET_AREA_LOCATION and it is also used by Jobs to persist their results. In this set of tutorials we will use an area at:

USER@SERVER:/zenodotus/dat01/campagne_lab_store/gobyweb_dat/GOBYWEB_TRIAL/FILESETS_AREA

where SERVER is the host name of the compute node and USER is an authorized user on the node.

To initialize the area, we just need to create the directory and have a password-less SSH access to the node.

Plugins Command

To register a compact-reads file, we use the command plugins-register-fileset. To check out the options of the command, just type:

plugins-register-fileset --help

This prints out a list of options accepted by the command along with a description for each of them.

Register a Reads file

Suppose we have a compact-reads file named demopaper-combined-NA19143.compact-reads available in a local folder called my-reads. We want to register it in the FileSet Area in order to be accessed later by other plugins.

The first step is to have a FileSet plugin that models compact-reads. In this tutorial, we use as plugins root location a check out of the plugins-SDK branch from the Plugin Repository. This branch offers a pre-defined plugin for compact-reads and therefore we are set for this step.

Next we need to prepare the “annotations” to attach as metadata to the registered file. This is the list of annotations typically exploited by aligner plugins:

  • “PAIRED_END_ALIGNMENT”
  • “BISULFITE_SAMPLE”
  • “COLOR_SPACE”
  • “ORGANISM”
  • “READS_PLATFORM”
  • “PAIRED_END_DIRECTIONS”
  • “LIB_PROTOCOL_PRESERVE_STRAND”
  • “READS_LABEL”
  • “BASENAME”
  • “INPUT_READ_LENGTH”

Annotations are attached as attributes passed to the registration command.

The final step is to run the plugins-register-fileset command with the appropriate parameters as follows:

plugins-register-fileset \
--plugins-dir PLUGINS_ROOT_LOCATION \
--fileset-area USER@SERVER:/zenodotus/dat01/campagne_lab_store/gobyweb_dat/GOBYWEB_TRIAL/FILESETS_AREA \
--owner instructor \
-a PAIRED_END_ALIGNMENT=false \
-a IS_BISULFITE_SAMPLE=false \
-a COLOR_SPACE=false \
-a ORGANISM=homo_sapiens \
-a READS_PLATFORM=Illumina \
-a IS_PAIRED_SAMPLE=false \
-a LIB_PROTOCOL_PRESERVE_STRAND=true  \
-a READS_LABEL=demopaper-combined-NA19143 \
-a INPUT_READ_LENGTH=35 \
guess: my-reads/demopaper-combined-NA19143.compact-reads

In this example, we used the ‘guess’ keyword. This leaves to the SDK the responsibility to find the proper FileSet plugin to use.

If everything works, the reads file is wrapped in what we call “FileSet instance” (an instance of the COMPACT_READS fileset plugin, in this case) and uploaded in the FileSet area. The execution ends printing the tag assigned to the instance.

2013-06-27 15:30:38,409 INFO - 1 fileset instance(s) has been successfully registered with the following tag(s):
2013-06-27 15:30:38,410 INFO - [DUSESWO]

Such tag can be then used as input for Aligner, Alignment Analysis and Task jobs.

Where is my file now?

The file is now persisted in the FileSet area. If we want to look where it is physically located, we logon on the OGE node and browse the FileSet area as follows:

#logon on the OGE node
ssh USER@SERVER
#change dir in the fileset area root folder
cd /zenodotus/dat01/campagne_lab_store/gobyweb_dat/GOBYWEB_TRIAL/FILESETS_AREA/
#now there is a new folder for the owner
cd instructor
#inside the folder, there is a sub-folder named with the tag assigned to the uploaded filset. Here it is our source reads file.
ls -lt DUSESWO
total 621184
-rw-r--r-- 1 gobyweb icb 636070575 Jun 27 15:34 demopaper-combined-NA19143.compact-reads

Edit a registered Reads file

Suppose we now want to change the annotations of the registered Fileset. For this, we have to use the plugins-edit-fileset. In the example below, we change the value of the ORGANISM annotation:

plugins-edit-fileset \
--plugins-dir PLUGINS_ROOT_LOCATION \
--fileset-area USER@SERVER::/zenodotus/dat01/campagne_lab_store/gobyweb_dat/GOBYWEB_TRIAL/FILESETS_AREA \
--owner instructor \
--tag DUSESWO \
-a ORGANISM=Caenorhabditis_elegans

If everything works, the command will print out something like:

2013-07-01 09:44:54,939 INFO - Fileset attributes have been successfully updated for instance DUSESWO

With the same command, it is possible to add new annotations to the Fileset instance.