Protein Bioinformatics Infrastructure for the Integration and Analysis of Multiple High-Throughput ※omics§ Data

作者:Chuming Chen; Peter B McGarvey; Hongzhan Huang; Cathy H Wu
来源:Advances in Bioinformatics, 2010, 2010: 1-19.
DOI:10.1155/2010/423589

摘要

High-throughput ※omics§ technologies bring new opportunities for biological and biomedical researchers to ask complex questions and gain new scientific insights. However, the voluminous, complex, and context-dependent data being maintained in heterogeneous and distributed environments plus the lack of well-defined data standard and standardized nomenclature imposes a major challenge which requires advanced computational methods and bioinformatics infrastructures for integration, mining, visualization, and comparative analysis to facilitate data-driven hypothesis generation and biological knowledge discovery. In this paper, we present the challenges in high-throughput ※omics§ data integration and analysis, introduce a protein-centric approach for systems integration of large and heterogeneous high-throughput ※omics§ data including microarray, mass spectrometry, protein sequence, protein structure, and protein interaction data, and use scientific case study to illustrate how one can use varied ※omics§ data from different laboratories to make useful connections that could lead to new biological knowledge. 1. Introduction Unlike traditional one-gene-at-a-time approach, which provides the detailed molecular functions of individual genes, the advances of high-throughput technologies in the study of molecular biology systems in the past decades marked the beginning of a new era of biological and biomedical research, in which researchers systematically study organisms on the levels of genomes (complete genetic sequences) [1], transcriptomes (gene expressions) [2], proteomes (protein expressions) [3], metabolomes (metabolic networks) [4], and interactomes (protein-protein interactions) [5]. Genomics analysis tells us the complete genetic sequences and the intragenomic interactions within the genomes. The sequences only tell us what a cell can potentially do. In order to know what a cell is doing, DNA microarray technologies [6] have been used to study the transcriptomes, also called Gene Expression Profiling [7], which examines the expression level of mRNAs of thousands of genes to give a global view of the cell functions under various conditions. Recently, high-throughput gene expression profiling technologies have been applied to help biomarker discovery and identification of molecular targets related to human cancer [8]. The genome of an organism is relatively constant, while the proteome of an organism, a set of expressed proteins under varied conditions, can be quite different for different cell types and conditions. Because the expression profiling at the