Every model generated by BDVal is assigned a unique identifier. This unique identifier is made up of a 6 letter code derived from a subset of command line arguments used to produce that model. The model identifier is used to keep track of the model at each stage during model development and evaluation.
The split-plan file indicates how samples in the input file are assigned to cross-validation folds, for each random repeat of cross-validation. The split-plan is generated by Define Splits Modes and is saved to a .split-plan file so that different feature selection strategies can be tested with exactly the same split partitions. The format of this file is shown in the Define Splits Mode.
Model Conditions file
model-id=YVIKN input=data/bdval/GSE8402/norm-data/GSE8402_family.soft.gz overwrite-output=false task-list=data/bdval/GSE8402/tasks/GSE8402-FusionYesNo-TrainingSplit.tasks platform-filenames=data/bdval/GSE8402/platforms/GPL5474_family.soft.gz conditions=data/bdval/GSE8402/cids/GSE8402-FusionYesNo-TrainingSplit.cids pathway-aggregation-method=PCA scale-features=true percentile-scaling=false normalize-features=false classifier=edu.cornell.med.icb.learning.libsvm.LibSvmClassifier gene-features-dir=./ dataset-name=dataset-name dataset-root=ds-root rserve-port=-1 cache-dir=cache pathway-components-dir=pathway-components num-features=10 splits=data/bdval/GSE8402/splits/fusion-cv-5-fs=false.split sequence-file=data/sequences/baseline.sequence evaluate-statistics=true
The output of feature selection is a feature list that descibes the most informative features for a particular model generated. The size of the feature list is determined by different parameter settings of each feature selection method. The feature list is saved in the format <dataset-name>-<classifier>-<model-id>-features-.txt and is a list of informative feature probeset ids. The feature list file format is the same as the gene list format. If feature pre-filtering is disabled then the feature list will contain only probeset IDs. An example is shown below:
|Ensembl Gene ID||EMBL ID||Refseq ID||Probeset ID|
Zipped Model File
All output files associated with a particular model have the same filename prefix which is a string combination of the methods used to build that model. For example the model prefix
indicates that this model was built using the LibSVM classifier, on the GEO series dataset GSE8402, the endpoint under consideration was FusionYesNo, this model was built from traning data using the global svm weights feature selection strategy and that the unique model idenfier is AGCKW.
The unzipped model file contains several component files which are used to reconstitute the model by different BDVal parameters.
The <modelFilenamePrefix>.properties file contains information about how the model was built. An example is shown below:
Predict mode outputs a prediction table for each model built. This table contains the following fields: