Motivation: The massive amounts of genome-wide gene expression data available for many organisms has motivated the development of computational approaches that leverage this information to pre- dict gene function. Among the successful approaches, maximum margin machine learning methods such as a Support Vector Machines have shown superior classification accuracy. Biologists, however, often prefer simpler methods like coexpression networks or clustering, because they are based on biologically meaningful concepts that scientists can understand and interpret.
Results: In this work we present Discriminative Local Subspaces (DLS), a novel supervised machine learning method designed to analyze gene expression data and to predict discriminative functional networks for a biological process of interest. DLS uses the knowledge available in the Gene Ontology (GO) project to generate informa- tive training sets that guide the discovery of expression signatures: discriminative expression patterns exhibited by genes associated to a biological process of interest in a particular subset of experimental conditions. These signatures provide key information to predict new functional connections. Furthermore, to deal with the still incomplete list of gene annotations in GO, DLS incorporates a novel scheme to discover false negatives. Our results using an Arabidopsis thaliana dataset show that DLS outperforms the predictions of previous works based on Coexpression Networks or Support Vector Machines. But, beyond its power for function prediction, we argue that DLS stands out among similar functional prediction methods because it provides valuable insight to help biologists to understand the predictions and to guide future experiments.
Availability and Implementation:
The MATLAB code to run DLS is freely available here (for academic use only).
The complete data set to run DLS is available here.
You can download a small data set to test the code here.