Pre-filtering features involes eliminating features based on some  predetermined criteria. Typically, pre-filtering is carried out using gene lists and is a way of using biologically specific features to build models.

The gene list method, as implemented in BDVal, leverages published gene list data in order to focus feature selection on genes that are likely to be predictive. When gene lists are selected independently from the dataset, the potential for over-fitting should be reduced. Similar phenotypes are likely to be mechanistically related. The method requires probes to genes information and potentially relevant gene lists.

Gene list Files

Gene list files are text files with 1 or more columns with a tab character between each column. Gene list files contain one line per feature.

 PrimaryID [tab] GenBankID [tab] RefSeqID [tab] ProbesetID

Lines beginning with the character ‘#’ are ignored. The fourth field is the probe set identifier which matches the chip.