At this time, it is likely that compiling Goby support into applications (such as GSNAP and BWA) on Mac computers will not work correctly.

This page describes the C and C++ APIs for reading and writing binary data files created using the Goby next-gen data management framework. You may have downloaded this API as part of the Goby source distribution, or downloaded it separately. In any case, you still need to configure, compile and install the API on your machine before you can use it to compile tools that use these APIs.

Installation

 

From goby-cpp/README.txt

1. On UNIX/Linux (and possibly Mac) systems (not necessary for Cygiwn), assuming you are
    using the BASH shell,  Edit the .bash_profile file so that pkgconfig
    will find libs/includes installed "locally"
      export LOCAL_LIB=${HOME}/local-lib
      export PKG_CONFIG_PATH=/usr/lib/pkgconfig:${LOCAL_LIB}/lib/pkgconfig
      export PATH=${LOCAL_LIB}/bin:${PATH}
      export LD_LIBRARY_PATH=${LOCAL_LIB}/lib:${LD_LIBRARY_PATH}
    ************************************************************************
    ** Logout and re-login so these environment variables are set in your **
    ** environment.                                                       **
    ************************************************************************
    Make the "local-lib" directories to store local libraries and binaries.
      mkdir -p ${LOCAL_LIB}/lib/pkgconfig/
      mkdir -p ${LOCAL_LIB}/bin/
2. Check your version of autoconf with the command "autoconf --version".
   If you aren't running _at_least_version 2.61, you should update your
   autoconf with the following commands
      wget http://ftp.gnu.org/gnu/autoconf/autoconf-2.68.tar.gz
      tar zxvf autoconf-2.68.tar.gz
      cd autoconf-2.68
      ./configure --prefix=${LOCAL_LIB}
      make
      make install
3. Install Protobuf 2.4.1.
      wget http://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz
      tar zxvf protobuf-2.4.1.tar.gz
      cd protobuf-2.4.1
      #
      # for root or cygwin, don't use the --prefix option
      #
      ./configure --prefix=${LOCAL_LIB}
      make
      make install
4. Download, build, and install the PCRE (Perl Compatible Regular
   Expressions) library (8.21 or later) from http://pcre.org
      wget ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-8.21.tar.gz
      tar zxvf pcre-8.21.tar.gz
      cd pcre-8.21
      #
      # for root or cygwin, don't use the --prefix option
      #
      ./configure --prefix=${LOCAL_LIB}
      make
      make install
5. Build the Goby C++ API library, requires the Goby source distribution.
   The following steps install this library:
      wget http://chagall.med.cornell.edu/goby/releases/goby_latest-cpp.zip
      unzip goby_latest-cpp.zip
      cd goby_VERSION/cpp/
      chmod +x autogen.sh
      ./autogen.sh
      #
      # For root or cygwin, don't use the --prefix option.
      #
      ./configure --prefix=${LOCAL_LIB}
      make
      make install

When you have successfully followed the previous steps (should take less then 30 minutes), you are ready to install an aligner with native Goby support (e.g., BWA or GSNAP). The following describes how to use the API from the point of view of a developer. Read on if you are interested in learning how to write C/C++ programs with the Goby API to parse Goby read or alignment files. If you want to use Goby for data analysis, now is a good time to try the tutorials with these tools.

Description of the API

The Goby C++ API provides a subset of of the operations in the Java implementation, however the most common operations are supported. The Goby C API provides a subset of the C++ API and is being developed to aid with extending other programs so that can natively work with Goby formats. The C API supports reading Goby compact-reads, writing Goby compact-alignments, and has some helper functions to assist with converting SAM-oriented data into Goby compact-alignments.

Specific details about how the Goby framework uses Protocol buffers can be found in the section called Developing with Goby.

 

Collection of reads using C++

 

As with the Java implementation, the Goby C++ API provides ReadsReader and ReadEntryIterator classes which provide an iterator over ReadEntry objects. This allows programs to use the class in a for loop to iterate over entries described in a complete compact reads file. The iterator decodes the chunked structure as it traverses the file and exposes each ReadEntry message, effectively hiding the chunk structure of the files. The process is very transparent to client programs, as illustrated in the following code snippet:

 1  #include <string>
 2  #include <goby/Reads.h>
 3  using namespace std;
 4
 5  int main(int argc, char** argv) {
 6    string filename = "input.compact-reads";
 7    goby::ReadsReader reads_reader = goby::ReadsReader(filename);
 8    for (goby::ReadEntryIterator it = reads_reader.begin(); it != reads_reader.end(); it++) {
 9        goby::ReadEntry entry = *it;
10        cout << "read-index: " << entry.read_index()
11             << " read-id: " << entry.read_identifier()
12             << " sequence: " << entry.sequence() << endl;
13    }
14  }

As with the Java example, the filename of the compact reads file is defined to be input.compact-reads on line 6. A ReadsReader instance is created on line 7 and used to create a loop that iterates through ReadEntry instances exposed by the ReadEntryIterator. The ReadEntryIterator is implemented as a standard C++ InputIterator. The loop will execute for as long as there are more ReadEntry objects to be read. The chunk structure of the underlying file is completely hidden from client code. Finally, line 10 prints the read index, read identifier and sequence for each entry.

Collection of reads using C

1  #include <stdio.h>
2  #include <goby/C_Reads.h>
3
4  int main(int argc, char** argv) {
5      if (argc < 2) {
6          printf("Specify compact-reads file to open.\n");
7          return;
8      }
9      char *input = argv[1];
10     CReadsHelper *targetReadsHelper;
11     gobyReads_openReadsReaderSingleWindowed(input, 0, 0, &targetReadsHelper);
12     while (gobyReads_hasNext(targetReadsHelper)) {
13         char *readIdentifier;
14         char *description; char *sequence; int sequenceLength;
15         char *quality; int qualityLength;
16         unsigned long readIndex = gobyReads_nextSequence(
17             targetReadsHelper,
18             &readIdentifier, &description,
19             &sequence, &sequenceLength,
20             &quality, &qualityLength);
21         printf("read-index: %d read-id: %s sequence: %s\n",
22             readIndex, readIdentifier, sequence);
23     }
24     gobyReads_finished(targetReadsHelper);
25     goby_shutdownProtobuf();
26 }

Unlike the previous example, this code will attempt to open the Goby compact-reads file specified as the first argument on the command line. On line 11, a ReadsReader is opened and placed in the targetReadsHelper variable. All subsequent calls to Goby C API functions will require passing targetReadsHelper as the first parameter. Line 12 is a loop that allows us to iterate through the entire compact-reads file. Lines 13-15 define variables which will be populated in line 16. It is important to note that the Goby C API handles all memory allocation internally – this means that it is not necessary to manually free memory for these variables, but it also means if you want to retain their value past the next call to gobyReads_nextSequence() you must duplicate their values in your own code. Lines 24 should be called when are are finished reading the compact-reads file. Any program that links to the Goby C API should call line 25 (even if you didn’t call any Goby C API methods).

Collection of alignment entries

Similar to what we described for reads, the Goby C++ API provides an AlignmentReader and an AlignmentEntryIterator. The following code snippet illustrates how to iterate through a Goby compact alignment file:

 1  #include <string>
 2  #include <goby/Alignments.h>
 3  using namespace std;
 4
 5  int main(int argc, char** argv) {
 6    string basename = "input";
 7    goby::AlignmentReader alignment_reader = goby::AlignmentReader(basename);
 8    for (goby::AlignmentEntryIterator it = alignment_reader.begin(); it != alignment_reader.end(); it++) {
 9    goby::AlignmentEntry entry = *it;
10    cout << "query-index: " << entry.query_index()
11         << " target-index: " << entry.target_index()
12         << " score: " << entry.score() << endl;
13    }
14  }

Examples/Modes

The Goby C++ API does not provide the full set of operations as found in the Java implementation, a few of the modes have been written using the C++ API. These are included as part of the Goby distribution and reviewing these are a good way to learn a lot about how to use Goby for your own projects.

GobyAlignmentStats.cc
Scan a Goby compact alignment file and prints statistics about the alignment. It provides the same information as the Java mode “compact-file-stats“. Similar to the corresponding Java mode, the script takes a basename of a compact alignment as input. (The files basename.entries and basename.header must exist).

GobyAlignmentStats <basename>
GobyAlignmentToText.cc
Converts a Goby compact alignment to to plain text. It provides the same information as the Java mode “alignment-to-text“. Similar to the corresponding Java mode, the script takes a basename of a compact alignment as input.

GobyAlignmentToText <basename>
GobyFastaToCompact.cc
Converts a FASTA or FASTQ file to the Goby “compact” format. It is similar to the Java mode “fasta-to-compact“.

GobyFastaToCompact input1 input2 ... inputN
GobyReadsStats.cc
Scan a Goby compact reads file and prints statistics about the entries. It provides the same information as the Java mode “compact-file-stats“. Similar to the corresponding Java mode, the script takes a name of a compact reads file as input.

GobyReadsStats <filename>