The following table describes global parameters that can be set for running BDVal.  Not all tasks require definition of these parameters.  However, some parameters are  always required as indicated in the table.

FlagArgumentsRequiredDescription
(-i|--input)inputyesInput filename. This file contains the measurement data used to discover markers and train models. See supported file formats.
(-p|--platform-filenames)platform-filenamesyesComma separated list of platform filenames. See supported file formats.
(-c|--conditions)conditionsyesSpecify the file with the mapping condition-name column-identifier (tab delimited, with one mapping per line). See details about the cids file format.
(-t|--task-list)task-listyesName of the file that describes the classification tasks. This file is tab delimited, with one line per task. First column is the input filename. Second column is the name of the first condition. Third column is the name of the second condition. Fourth column is the number of  samples in the first condition. Fifth column is the number of samples in the second condition. See details about the tasks file format.
(-o|--output)outputnoName of the output file. Output is printed to the console when this flag is absent or when the value “-” is given.
--propertiespropertiesnoName of the properties file. A Java properties file with bdval specific configuration properties. See the User manual for a description of the Java properties recognized by BDVal.
--overwrite-outputoverwrite-outputnoWhen true and -o is specified, the output file will be over-written. (default: false)
--model-idmodel-idnoThe model-id, created in ExecuteSplitsMode – a hash of the options  (default: no_model_id)
(-g|--gene-lists)gene-listsnoName of the file that describes the gene lists. This file is tab delimited, with one line per gene list. First column is the name of the gene list. Second column (optional) is the name of the file which describes the gene list (see format of individual gene list files here). If the file has only one column, the name of list is random, the second field indicates how many random probesets must be selected, and a third field indicates the random seed to use for probeset selection.
--gene-listgene-listnoArgument of the form gene-list-name|filename. The filename points to a single gene list file. See the gene list file format.
--seedseednoSeed to initialize random generator.
--pathwayspathwaysnoFilename of the pathway description information. The pathway description information is a file with one line per pathway. Each line has two tab delimited field. The first field provides a pathway identifier. The second field is space delimited. Each token of the  second field is an (Ensembl) gene ids for gene that belong to the pathway. When this option is provided, features are aggregated by pathway and computations are performed in aggregated feature space. Some aggregation algorithms may generate several aggregated features per pathway. When this option is active, the option –gene2probes must be provided on the command line. See the feature aggregation page for details about configuring these options.
--pathway-aggregation-methodpathway-aggregation-methodnoIndicate which method should be used to aggregate features for pathway runs. Two methods are available: PCA or average. PCA performs a principal component analysis for the probesets of each pathway. Average uses a single feature for each pathway calculated as the average of the probeset signal in each pathway. Default is PCA. (default: PCA). See the feature aggregation page for details about configuring these options.
--gene-to-probesgene2probesnoFilename of the gene to probe description information. The pathway description information is a file with one line per gene. Each line is tab delimited. The first field is an ensembl gene id. The second field is a probe id which measures expression of a transcript of the gene. Several lines may share the same gene id, indicating that multiple probe ids exist for the gene. See the feature aggregation page for details about configuring these options.
--floorfloornoSpecify a floor value for the signal. If a signal is lower than the floor, it is set to the floor. If no floor is provided, values are unchanged.
--two-channel-arraytwo-channel-arraynoIndicate that the data is for a two channel array. This flag affects how the floor value is interpreted. For two channel arrays, values on the array are set to 1.0 if (Math.abs(oldValue-1.0)+1)<=floorValue, whereas for one channel array the condition becomes: oldValue<=floorValue.
--logged-arraylogged-arraynoIndicate that the data on this array has been logged. This option affects flooring for two color aryays. When the option is specified, the floor is applied around a center value of zero. When the option is not specified, two color arrays are floored around a value of 1 (no change).
--scale-featuresscale-featuresnoIndicate whether the features should be scaled to the range [-1 1]. If false, no scaling occurs. If true, features are scaled. (default: true)
--percentile-scalingpercentile-scalingnoIndicate whether feature scaling is done with percentile and median or full range and average. When percentiles are used, the range of each feature is determined as the range of the 20-80 percentile of the data and median is used instead of the mean. (default: false)
--scaler-class-namescaler-class-namenoThe classname of the scaler implementation. Overrides –percentile-scaling if provided.
--normalize-featuresnormalize-featuresnoIndicate whether the feature vectors should be normalized. If false, no normalizing occurs. If true, features are normalized. (default: false)
(-l|--classifier)classifiernoFully qualified class name of the classifier implementation. (default:edu.cornell.med.icb.learning.libsvm.LibSvmClassifier)
(-a|--classifier-parameters)classifier-parametersnoComma separated list of parameters that will be passed to the classifier. Parameters vary from one classifier to the next. Check the documentation of the classifier and the source code to see which parameters can be set.
--gene-features-dirgene-features-dirnoThe directory where gene features files will be read from (when specified in a -gene-lists.txt file). (default: ./)
--dataset-namedataset-namenoThe name of the dataset being run. (default: dataset-name)
--dataset-rootdataset-rootnoThe root directory where the dataset files exist. (default: ds-root)
--output-stats-from-gene-listn/ano
--rserve-portrserve-portnoThe Rserve port to use (default: -1)
--process-split-idprocess-split-idno
Restricts execution to a split id. A split execution plan must be provided as well. The split id is used together with the split plan to determine which samples should be processed. Typical usage would be “–process-split-id 2 –split-plan theplan.txt –split-type training” This would result in training samples being used that match split #2 in
theplan.txt
--split-plansplit-plannoFilename for the split plan definition. See process-split-id.
--split-typesplit-typenoSplit type (i.e., training, test, feature-selection, must match a type listed in the split plan). See process-split-id.
--cache-dircache-dienoCache directory. Specify a directory when intermediate processed tables will be saved for faster access. (default: cache)
--enable-cachen/anoEnables caching for faster access to processed tables.
--pathway-components-dirpathway-components-dirnoDirectory where pathway components will be stored.  (default: pathway-components)