Goby 1.9.1 will support annotating compact-read files with meta-data. This feature was suggested by a member of the audience at the SEQC meeting on Dec 6th who indicated that next-gen read formats lack headers to record information about the samples. We have now extended the Goby compact-reads format to support describing arbitrary meta-data to a collection of reads. The change is fully compatible with previous Goby compact-read files.
Meta-data can be defined when converting FASTA/FASTQ to compact format.
For instance, the following command will record the date the reads were processed and the the sequencing instrument that generated the data. It is a good idea to record at least these two pieces of information about reads.
goby 1g fasta-to-compact -k platform -v "Illumina HiSeq 2000" -k sequencing-run-start-date -v "01/12/2011" input.fastq -o output.compact-reads
The previous command shows how to use the –key and –value options to define multiple key/value pairs of meta-data. It is also possible to define meta-data in a Java properties file (one key/value pair per line, in the format key=value). Such a meta-data file can be specified as follows:
-- file meta-data.props contains: platform=Illumina HiSeq 2000 sequencing-run-start-date=01/12/2011 -- file ends on previous line goby 1g fasta-to-compact --key-value-pairs meta-data.props input.fastq -o output.compact-reads
The file output.compact-reads can now be inspected for meta-data. The mode compact-file-stats will display key-value pairs on the standard output:
goby 1g compact-file-stats output.compact-reads INFO GobyDriver - edu.cornell.med.icb.goby.modes.GobyDriver Implementation-Version: development (20110112111454) Compact reads filename = output.compact-reads meta-data key=platform value=Illumina HiSeq 2000 meta-data key=sequencing-run-start-date value=01/12/2011
The text is bold displays the meta-data key-value pairs recorded in the compact reads file.