Goby 1.9.1 will support annotating compact-read files with meta-data. This feature was suggested by a member of the audience at the SEQC meeting on Dec 6th who indicated that next-gen read formats lack headers to record information about the samples. We have now extended the Goby compact-reads format to support describing arbitrary meta-data to a collection of reads. The change is fully compatible with previous Goby compact-read files.

Meta-data can be defined when converting FASTA/FASTQ to compact format.

For instance, the following command will record the date the reads were processed and the the sequencing instrument that generated the data. It is a good idea to record at least these two pieces of information about reads.

goby 1g fasta-to-compact  -k platform -v "Illumina HiSeq 2000" -k sequencing-run-start-date -v "01/12/2011" input.fastq -o output.compact-reads

The previous command shows how to use the –key and –value options to define multiple key/value pairs of meta-data. It is also possible to define meta-data in a Java properties file (one key/value pair per line, in the format key=value). Such a meta-data file can be specified as follows:

-- file meta-data.props contains:
platform=Illumina HiSeq 2000
sequencing-run-start-date=01/12/2011
-- file ends on previous line
goby 1g fasta-to-compact  --key-value-pairs meta-data.props  input.fastq -o output.compact-reads

The file output.compact-reads can now be inspected for meta-data. The mode compact-file-stats will display key-value pairs on the standard output:

goby 1g compact-file-stats output.compact-reads
INFO  GobyDriver           - edu.cornell.med.icb.goby.modes.GobyDriver Implementation-Version: development (20110112111454)
Compact reads filename = output.compact-reads
meta-data key=platform value=Illumina HiSeq 2000
meta-data key=sequencing-run-start-date value=01/12/2011

The text is bold displays the meta-data key-value pairs recorded in the compact reads file.