Bioinformatics

Bioinformatics

Researchers in the post-genomic era often find themselves comparing long lists of genes, transcripts, proteins or other biological molecules. The first logical step to understand the molecular mechanisms underlying the biological associations of lists of genes (e.g. regulated in a microarray experiment) is to analyze their properties in the context of what is known. However, this task is often impractical considering that experiments with genomic techniques can easily yield hundreds or even thousands of data points (e.g. genes differentially regulated in response to a treatment). In addition, aside from the sheer volume of knowledge accumulated in the literature, existing genomic data comes from a large number of experimental approaches and an even larger number of laboratories. It is not trivial to integrate this scale of genomic information in a biologically meaningful way. One layer of difficulty stems from the fact that information is stored in numerous databases and it is encoded in various formats and database schemas. Indeed, integrating heterogeneous molecular biological databases is one of the most important ongoing tasks in bioinformatics.

We find that integrating existing knowledge into a relatively simple qualitative network graph greatly simplifies the task of extracting meaning and finding association between genes in large lists. In collaboration with Gloria Coruzzi (Biology – New York University) and Dennis Shasha (Courant – New York University) we are developing a software platform that simplifies the task of generating gene networks and support research in the post-genomic era. Our software system integrates new informatics tools and existing publicly available genomic data to enable dynamic modeling and visualization of molecular networks in plant cells. By integrating the available information for genes and proteins of the plant cell in an intelligible way and within a biological context, we will aid biologists to identify molecular networks and generate hypothetical models that can explain gene associations observed with high throughput experimental methods. In collaboration with Gloria Coruzzi (Biology – New York University) and Dennis Shasha (Courant – New York University) we are also developing novel visualization techniques to render the multivariate information in visual formats that facilitate extraction of biological concepts as well as mathematical and statistical methods to help summarize the data. We are implementing all these tools and approaches in a system we term VirtualPlant. Such a system is essential for a Systems Biological analysis of the genomic data available and it will provide a framework for the analysis of future high throughput data. Although our main interest is in plants and using Arabidopsis thaliana as a model system, the tools we develop will be generic and applicable to any organism whose genome is sequenced. The software and methods developed will be freely available to the scientific community.