Palakal, MathewDhaval, Rakesh2005-08-192005-08-19https://hdl.handle.net/1805/366http://dx.doi.org/10.7912/C2/823Submitted to the faculty of the University Graduate School In partial fulfillment of the requirements For the degree Master of Sciences In the School of Informatics, Indiana University August, 2005The aim of this thesis is to develop a biological database mining tool that incorporates mining of various publicly available heterogeneous databases and provides researchers with a reporting and visualization tool for sub-cellular localization of genes and proteins. Although there is little conservation of the primary structure, the general physiochemical properties are conserved to some extent among proteins that share sub-cellular location. Hence, the function of a protein is closely correlated with its sub-cellular location. Data in the field of genomics and proteomics are detailed, complex, and voluminous and distributed in heterogeneous databases. Most of the earlier work in information extraction from biological databases focused on database integration using wrapper techniques. However, little work has been done to mine specific data leading to the identification of pathway information and evolutionary relationship from heterogeneous biological databases. The need to develop an interactive information visualization tool leading to biological pathway detection for genes by using controlled vocabulary and various publicly available biological databases has led to the concept and implementation of GCell. This system provides a researcher to move from raw text data at a broader level to a much more detailed view of pathways representing complex biological interactions.3281235 bytesapplication/pdfen-USgcellcellularGCell A Sub-Cellular Localization ToolThesis