This mode is used to extract specific samples from a large VCF file. It is a high-performance replacement for the VCF-tools program ‘vcf-subset’.  Our rational for rewritting vcf-subset in Goby is given in this blog post. This mode is implemented by edu.cornell.med.icb.goby.modes.VCFSubsetMode.java. Since Goby 1.9.8.2.

Mode Parameters

The following options are available in this mode

FlagArgumentsRequiredDescription
(-s|--suffix)outputnoThe output suffix to construct an output filename for each input file. The output filename will be input-filename – extensions + suffix + vcf.gz
n/ainputyesThe filenames of the input VCF files to subset. This mode supports input files with the .gz extension when these files have been compressed with BGZip. Note that gzip compression is not fully compatible with BGZip in the Java implementation of BGZip. You can obtain BGZip with the tabix distribution. Trying to vcf-subset gzipped compressed files is not supported and will result in exceptions.
(-c|--column)columnnoName of a column/sample to extract from the input and write in the output.
(-r|--required-info-flags)required-info-flagsnoName of INFO flags that must appear in a record for the record to be written in the output. This argument acts as a filter on the input.
--paralleln/anoProcess input files in parallel. By default, uses as many threads as available in the server. Adjust the number of threads with -Dpj.nt=x, where x is the desired number of threads. Default value: FALSE
--constant-formatn/anoOptimize for constant FORMAT fields. When the FORMAT fields are constant throughout each record of a field and the INFO field always contains the same number of fields, providing this flag with skip some time consuming steps. Since INFO is often variable, use with care, as variations in the INFO column will shift FORMAT fields from one sample to another. Default value: FALSE
--exclude-refn/anoRemove positions that are strictly homozygous matching the reference in all subset samples. Default value: FALSE