The following table describes global parameters that can be set for running BDVal.  Not all tasks require definition of these parameters.  However, some parameters are  always required as indicated in the table.

Flag Arguments Required Description
(-i|--input) input yes Input filename. This file contains the measurement data used to discover markers and train models. See supported file formats.
(-p|--platform-filenames) platform-filenames yes Comma separated list of platform filenames. See supported file formats.
(-c|--conditions) conditions yes Specify the file with the mapping condition-name column-identifier (tab delimited, with one mapping per line). See details about the cids file format.
(-t|--task-list) task-list yes Name of the file that describes the classification tasks. This file is tab delimited, with one line per task. First column is the input filename. Second column is the name of the first condition. Third column is the name of the second condition. Fourth column is the number of  samples in the first condition. Fifth column is the number of samples in the second condition. See details about the tasks file format.
(-o|--output) output no Name of the output file. Output is printed to the console when this flag is absent or when the value “-” is given.
--properties properties no Name of the properties file. A Java properties file with bdval specific configuration properties. See the User manual for a description of the Java properties recognized by BDVal.
--overwrite-output overwrite-output no When true and -o is specified, the output file will be over-written. (default: false)
--model-id model-id no The model-id, created in ExecuteSplitsMode – a hash of the options  (default: no_model_id)
(-g|--gene-lists) gene-lists no Name of the file that describes the gene lists. This file is tab delimited, with one line per gene list. First column is the name of the gene list. Second column (optional) is the name of the file which describes the gene list (see format of individual gene list files here). If the file has only one column, the name of list is random, the second field indicates how many random probesets must be selected, and a third field indicates the random seed to use for probeset selection.
--gene-list gene-list no Argument of the form gene-list-name|filename. The filename points to a single gene list file. See the gene list file format.
--seed seed no Seed to initialize random generator.
--pathways pathways no Filename of the pathway description information. The pathway description information is a file with one line per pathway. Each line has two tab delimited field. The first field provides a pathway identifier. The second field is space delimited. Each token of the  second field is an (Ensembl) gene ids for gene that belong to the pathway. When this option is provided, features are aggregated by pathway and computations are performed in aggregated feature space. Some aggregation algorithms may generate several aggregated features per pathway. When this option is active, the option –gene2probes must be provided on the command line. See the feature aggregation page for details about configuring these options.
--pathway-aggregation-method pathway-aggregation-method no Indicate which method should be used to aggregate features for pathway runs. Two methods are available: PCA or average. PCA performs a principal component analysis for the probesets of each pathway. Average uses a single feature for each pathway calculated as the average of the probeset signal in each pathway. Default is PCA. (default: PCA). See the feature aggregation page for details about configuring these options.
--gene-to-probes gene2probes no Filename of the gene to probe description information. The pathway description information is a file with one line per gene. Each line is tab delimited. The first field is an ensembl gene id. The second field is a probe id which measures expression of a transcript of the gene. Several lines may share the same gene id, indicating that multiple probe ids exist for the gene. See the feature aggregation page for details about configuring these options.
--floor floor no Specify a floor value for the signal. If a signal is lower than the floor, it is set to the floor. If no floor is provided, values are unchanged.
--two-channel-array two-channel-array no Indicate that the data is for a two channel array. This flag affects how the floor value is interpreted. For two channel arrays, values on the array are set to 1.0 if (Math.abs(oldValue-1.0)+1)<=floorValue, whereas for one channel array the condition becomes: oldValue<=floorValue.
--logged-array logged-array no Indicate that the data on this array has been logged. This option affects flooring for two color aryays. When the option is specified, the floor is applied around a center value of zero. When the option is not specified, two color arrays are floored around a value of 1 (no change).
--scale-features scale-features no Indicate whether the features should be scaled to the range [-1 1]. If false, no scaling occurs. If true, features are scaled. (default: true)
--percentile-scaling percentile-scaling no Indicate whether feature scaling is done with percentile and median or full range and average. When percentiles are used, the range of each feature is determined as the range of the 20-80 percentile of the data and median is used instead of the mean. (default: false)
--scaler-class-name scaler-class-name no The classname of the scaler implementation. Overrides –percentile-scaling if provided.
--normalize-features normalize-features no Indicate whether the feature vectors should be normalized. If false, no normalizing occurs. If true, features are normalized. (default: false)
(-l|--classifier) classifier no Fully qualified class name of the classifier implementation. (default:edu.cornell.med.icb.learning.libsvm.LibSvmClassifier)
(-a|--classifier-parameters) classifier-parameters no Comma separated list of parameters that will be passed to the classifier. Parameters vary from one classifier to the next. Check the documentation of the classifier and the source code to see which parameters can be set.
--gene-features-dir gene-features-dir no The directory where gene features files will be read from (when specified in a -gene-lists.txt file). (default: ./)
--dataset-name dataset-name no The name of the dataset being run. (default: dataset-name)
--dataset-root dataset-root no The root directory where the dataset files exist. (default: ds-root)
--output-stats-from-gene-list n/a no
--rserve-port rserve-port no The Rserve port to use (default: -1)
--process-split-id process-split-id no
Restricts execution to a split id. A split execution plan must be provided as well. The split id is used together with the split plan to determine which samples should be processed. Typical usage would be ”–process-split-id 2 –split-plan theplan.txt –split-type training” This would result in training samples being used that match split #2 in
theplan.txt
--split-plan split-plan no Filename for the split plan definition. See process-split-id.
--split-type split-type no Split type (i.e., training, test, feature-selection, must match a type listed in the split plan). See process-split-id.
--cache-dir cache-die no Cache directory. Specify a directory when intermediate processed tables will be saved for faster access. (default: cache)
--enable-cache n/a no Enables caching for faster access to processed tables.
--pathway-components-dir pathway-components-dir no Directory where pathway components will be stored.  (default: pathway-components)