This mode is discovering sequence variants, calling genotypes, estimating allele frequencies, or methylation rates (RRBS or methyl-seq datasets). It is implemented by edu.cornell.med.icb.goby.modes.DiscoverSequenceVariants.java.
This mode requires sorted/indexed alignments as input. (Since Goby 1.8).
See this tutorial for in more information about the various output formats and types of analyses supported.
The following options are available in this mode
|output||no||The name of the output file. Default value: -|
|input||yes||The basenames of the input alignments|
|include-reference-names||no||When provided, process only reference identifiers listed in this comma separated list. To process only chromosome 19 and 1, if sequences are identified by 1 and 19, use: --include-reference-names 1,19|
|groups||no||Define groups for multi-group comparisons. This option is required for some output formats. The syntax of the groups arguments is id-1=basename1,basename2/id-2=basename4,basename5 Where id-1 is the id of the first group, defined to consist of samples basename1 and basename2. basename1 must refer to a basename provided as input on the command line (see input). Multiple groups are separated by forward slashes (/). If the option is not provided, one group per sample is assumed and group1… groupN are defined corresponding to the N input basenames, in the order these appear on the command line.|
|group-file||no||Define groups for multi-group comparisons. This parameter names a file in the Java properties format. The format describes mapping between samples and groups. Each line must have the format sample-id=group-id or be a comment. Please refer to the --groups option for a description of sample/group mapping.|
|compare||no||Compare sequence variations across groups of samples. This option is required for some output formats. When provided, the compare flag must be followed by group ids separated by slashes. For instance, if groups group-A and group-B have been defined (see --groups option), --compare group-A/group-B will evaluate statistical tests between sequence variation in groups A and B.|
|eval||no||List of optional analysis steps. This option is currently ignored. Previous versions use this option to control output. Output formats are now controlled by the --format flag. Default value: samples|
|start-position||no||The start position within the file, which should be number of bytes into the file to start reading from, or a string in the format ref-id,ref-position (since Goby 1.9). Only entries that have a position after the start position are considered.|
|end-position||no||The end position within the file, which should be number of bytes into the file to end reading from, or a string in the format ref-id,ref-position (since Goby 1.9). Only entries that occur before the specified position are analyzed.|
|start-flap-size||no||Size of the flap to consider before start-position (in bp). Reads that start within start-flap and start are used to accumulate base counts. Base counts are used to emit statistics between start and end, but not between start-flap and start. This strategy makes is possible to concatenate results from distinct windows without reporting redundant results. Default value: 1000|
|variation-stats||no||Path to a variation stastistics file produced with --mode sequence-variation-stats2 over the same set of alignments. This file provides variation counts for specific read indices which are used in calculating the within group P-values for variation discovery.|
|minimum-variation-support||no||The minimum number of times a variation must be seen across all alignments to be considered for statistical test. Default value: 10|
|threshold-distinct-read-indices||no||The minimum number of distinct read indices that support a variation for this variation to be considered for statistical test. Default value: 3|
|n/a||no||Run some computations in parallel. You can tune the number of processors used by setting the property pj.nt. For instance, -Dpj.nt=5 will use 5 parallel threads. When --parallel is specified, one thread per processing core of the machine will be used unless specified otherwise (with pj.nt). Default value: FALSE|
|format||no||The name of the output format. Possible choices are genotypes, allele_frequencies, compare_groups, methylation. Default value: variant_discovery|
|genome||yes||A genome basename. The genome must have been processed by build-sequence-cache mode to produce the compressed, random access files this mode needs.|
|processor||no||The name of the alignment processor. An alignment processor can be configured to scan alignment entries before calling variants. By default, no processor is used. If you specify realign_near_indels, a processor that realigns reads in the proximity of indels will be used. Please note that this feature is experimental (Since Goby 1.9.7). Default value: NONE|