Hansen KD et al have described  a method to reweight base level position counts to remove systematic hexamer priming bias. This tutorial describes how to estimate these weights associated with individual reads and to correct counts with the weights before performing differential expression analyses. This option will be available in Goby 1.7+.


We assume that you have a set of reads file in the compact reads format and that you have aligned these reads to a reference genome to produce a set of Goby alignments. Refer to the Getting Started page to learn how to convert fastq to compact-reads or run alignments.

Collect heptamer frequencies and heptamer weights

The first step is to scan a set of reads to estimate heptamer frequencies and weights. This can be done with the Goby heptamer-weights mode.

java -Xmx1g -jar goby.jar *.compact-reads -m heptamer-weights -o heptamer-info.bin

This command will scan the reads file provided as input and will write heptamer information to the output file named ‘heptamer-info.bin’

If you are curious and would like to look at heptamer frequencies and weights, you can instruct the mode to write tab delimited files with these data. See the –heptamer-counts and –weights options

[(-c|--heptamer-counts) <heptamer-counts>]
          The filename where heptamer count statistics will be written (in tab delimited format).
[(-w|--weights) <weights>]
          The filename where weights will be written for individual heptamers (in tab delimited format).

Note that the tab delimited files are not required to proceed. The binary file written with the -o option is much smaller and sufficient.

Associate individual reads to a weight

The next step is to associate individual reads to a weight. The method described by Hansen et al consists in giving each read the weight of the heptamer that the read contains at position 1. This can be done with the Goby reads-to-weights mode.

java -Xmx1g -jar goby.jar *.compact-reads -m reads-to-weights -e heptamer-info.bin

The previous command scans each *.compact-reads file provided as input and produces a corresponding *.weight file. Each .weight file encodes the mapping between read index and weight.

Rename weights files

To use read weights with modes that process alignments it is necessary to rename the weight file to match the basename of the alignment produced with the reads. There is no automatic way to do this, but the process is quite simple.

Suppose you started with two read files (setA and setB) and have estimated weight in the first step above. You should have the following files:

setA.compact-reads setA.weights
setB.compact-reads setB.weights

You have aligned these two sets of reads against a reference, to produce the alignments:

setA-align.entries, setA-align.header, setA-align.tmh
setB-align.entries, setB-align.header, setB-align.tmh

This step consists in renaming setA.weights to setA-align.weights and setB.weights to setB-align.weights

Estimate counts and perform differential expression analysis

To use the Goby modes alignment-to-annotation-counts, alignment-to-transcript-counts, or alignment-to-counts with reweighted counts, simply provide the –use-weights option. These modes will then automatically load weights associated with each alignment they process and reweight the counts to try and correct for hexamer priming bias.