Q: What are the characteristic features of BDVal, in a nutshell?

See our Bioinformatics application note (submitted for publication) for a high-level overview of BDVal concepts, abstractions and features.

Q: Does BDVal allow integration of custom user code?

Yes, BDval is open-source and can be extended in various ways. See our Developers Page for example of customizations and extensions.

Q: What are the benefits and limitations of BDval compared to other libraries such as Weka, Rapid-Miner, related bioinformatic libraries, and so on (why should the practioner use BDval).

Weka and Rapid-Miner both are general purpose machine learning toolkits that provide user interfaces for data exploration. In contrast, the BDVal program and associated scripts automate the development of predictive models from high-throughput data in the pre-clinical or clinical setting. The pre-clinical and clinical modeling application domains have unique characteristics that are not always easy to accommodate with general purpose machine learning packages. For instance, in these application domains, it is quite useful to be able to select a few features from which to develop models. This is important because there is a cost associated with measuring each feature that will be included in a diagnostic test, and a model with a few tens of features is much preferred to a model that require the measurement of 50,000 features. To make it easier to develop models with manageable number of features, BDVal supports feature pre-filtering and fully support honest/complete cross-validation, where feature selection is embedded in the cross-validation loop.

In contrast to general purpose data machine learning packages, BDVal is customized to develop predictive models. Common biomarker model development tasks have been automated in BDVal that generally require scripting or user interface interaction to setup an analysis pipeline in a general machine learning package such as Weka or Rapid-Miner. BDVal provides fully automated protocols that have been tested on pre-clinical and clinical datasets.

Q: Are there limitations or requirements to the use of BDVal?

Limitations include the usual need for suitable datasets (see next question). See our general configuration page for hardware and software requirements.

Q: Which type of data sets work best with BDVal?

We have used BDVal with datasets with up to tens of thousands of features generated with

  • microarrays,
  • proteomics or
  • DNA methylation HELP assays

Some datasets we have studied had several hundred samples. We recommend working with datasets that have at least 100 samples for training and similar number for independent validation. However, we recognize that it can be difficult to estimate sample size in the absence of a pilot study. Please consult with a local biostatistician before designing a biomarker study.

Q: Where can I get help if I run into problems?

We have created a BDVal user forum on Google groups. Please consider posting your questions  to this forum so others can see the answers.