Towards a Comprehensive Human Pathway Database For Systems Biology Applications
Embargo Lift Date
A biological pathway is a series of reactions and molecular interactions. Pathway information provides a blueprint for biomedical researchers to devise new treatment and diagnostic solutions for human diseases. Pathway data are publicly available in many databases. Most of the databases however have only partial coverage of human biological pathways, especially for signal transduction and gene regulatory pathways. Comprehensive knowledge and a combined view for all types for biological pathways may lead to insights into the molecular physiology of cells and drug discovery. In this project, we collected human signaling pathway data from different database sources. The pathway data comes from the Biocarta, Protein Lounge, Resnet and NCI-Nature Curated databases. We analyzed the structures of collected data, and developed a comprehensive data model to manage integrated information of molecules, complexes, regulation relationships of molecules, and reactions involved in signaling and regulatory pathways. The integrated database, the Human Pathway database (HPD), is a data warehouse of a comprehensive collection of biological reactions, regulators, and metabolites, serving as a potential platform for future pathway analysis studies. We developed the database to manage integrated biomolecular information related to pathways. In HPD, which uses Oracle 10g, there are a total of 1,895 pathways, 10,631 molecular entities (proteins, complexes and compounds) and 4,370 reactions that were consolidated and integrated into a single relational database platform using Oracle 10g. We also developed a prototypical GUI for the navigation and querying of biological pathways. Such a system, when completed, and used in conjunction with future pathway data visualization and analysis tools, could provide a framework for systems biology studies. We developed HPD with comprehensive annotation data. For example, the kinase - disease association information and perturbation effect of several environmental factors are unique in HPD. The potential for merging similar pathways—“pathway mergability” —was created based on gene/protein identifiers and to provide with non-redundant pathway information. To test the technology we have done two case studies. The first case study addressed the Alzheimer’s disease by coupling pathway network analysis with gene expression data. The second case study addressed the cellular responses by coupling the pathway network analysis with protein expression data. These case studies demonstrate how an integrated pathway database can be used to generate new insights into the discovery of biomarkers, and extend our understanding of cellular physiology.