This mode is used to reformat compact files, possibly dropping identifiers, or descriptions, splitting the file in several pieces, or even trimming the sequence or introducing mutations. When a compact-reads file is split, reads in each split are renumbered (their read index is changed), starting at zero for the first sequence of each split. This ensures that indices are correctly concatenated back together. It is implemented by edu.cornell.med.icb.goby.modes.ReformatCompactReadsMode.java.

Mode Parameters

The following options are available in this mode

FlagArgumentsRequiredDescription
(-d|--include-descriptions)n/anoWhen this switch is provided, include description lines into the compact output. By default, ignore description lines. Default value: FALSE
(-y|--include-identifiers)n/anoWhen this switch is provided, include identifiers into the compact output. By default, ignore identifiers. Identifiers are parsed out of description lines as the token before the first space or tab character. Default value: FALSE
--exclude-sequencesn/anoWhen this switch is provided, exclude sequences. This results in not writing sequences to the compact file. This can be useful to keep only an association between sequence index and identifier. Default value: FALSE
(-o|--output)outputnoIf there is only one read file, this will force the output file to this specific filename. Please note that the –ouptut argument is required when a single input file is provided on the command line. If there is more than one input file, the output filename will always be the input filename appended with some string and a .compact-reads suffix. You should generally use an extension of .compact-reads when writing a compact reads file.
(-n|--sequence-per-chunk)sequence-per-chunknoThe number of sequences that will be written in each compressed chunk. Default is suitable for very many short sequences. Reduce to a few sequences per chunk if each sequence is very large. Default value: 10000
--minimum-read-lengthminimum-read-lengthnoSequences below this length are omitted. Default value: 0
--maximum-read-lengthmaximum-read-lengthnoSequences above this length are omitted. Default value: 2147483647
--mutate-sequencesn/anoWhen this switch is provided, each sequence is mutated according to the mutation parameters. This option is only useful to introduce mismatches in sequences to create controls. It is generally not meant to be used in a production pipeline. Default value: FALSE
--mismatch-numbermismatch-numbernoWhen the –mutate-sequence switch is activated, indicates how many mismatches should be introduced in each input sequence (at random positions). Default value: 0
(-p|--sequence-per-output)sequence-per-outputnoThe maximum number of sequences that will be written to each output file. Output files are split if the file would contain more sequences than indicated by this parameter.
n/ainputyesThe compact reads files provided as input to reformat.
(-s|--start-position)start-positionnoThe start position within the file, which should be number of bytes into the file to start reading from. The read will actually start at the first record on or after start-position.
(-e|--end-position)end-positionnoThe end position within the file, which should be number of bytes into the file to end reading from. The read will actually end at the end of the record on or after end-position.
--trim-read-lengthtrim-read-lengthnoWhen this option is present read lengths will be trimmed to a maximum length of this value. Unlike the maximum-read-length, all reads are kept but may be trimmed. Default value: 2147483647
(-t|--trim-read-start)trim-read-startnoWhen this option is present read lengths will be start trimmed by the specified number of characters. Default value: -1
(-f|--read-index-filter)read-index-filternoThe name of a read index filter. When provided, compact-to-fasta will only write reads to the output if their index is contained in the filter.
-xdynamic-optionsnoSet a dynamic option, in the format classname:key=value. Classname is the the name of the class that exposes the option (short class name without package), key identifies the option to change and value is the new value for the option.