Department of BioHealth Informatics Works

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 10 of 396
  • Item
    Development of a Citizen Science platform for Indianas'a Cardiovascular Mortality Rates and Social Health Determinants
    (2023-12-17) Malempati, Thejomayi; Purkayastha, Saptarshi; Hamid, Zeyana
    The high cardiovascular disease (CVD) mortality burden in Indiana, which has witnessed over 82,373 CVD-attributed deaths from 2017-2021 as per vital statistics data from the Indiana Department of Health. The project explores the association between CVD mortality rates and social determinants of health including gender, education level, occupation, and lifestyle factors across 779 Zip Code Tabulation Areas (ZCTAs) in Indiana. Merging cardiovascular mortality rates from the Indiana Department of Health with socioeconomic attributes for each zip code, the study explores the complex intersections of place and society in determining health. Spatial data visualization maps geographic clusters exhibiting elevated death rates warranting priority attention. Statistical models estimate the extent of disproportionate mortality risk faced by disadvantaged groups after accounting for other factors. Overall, the project knits together conceptual underpinnings from medical geography, social epidemiology, and health informatics to put the spotlight on social determinants as pivotal upstream drivers of cardiovascular health disparities. The interactive heatmaps and dashboards will allow for citizen science and participation in understanding targeted interventions that may address root causes of challenges promoting health equity.
  • Item
    Mapping Genetic Variation in Arabidopsis in Response to Plant Growth-Promoting Bacterium Azoarcus olearius DQS-4T
    (MDPI, 2023-01-28) Plucani do Amaral, Fernanda; Wang, Juexin; Williams, Jacob; Tuleski, Thalita R.; Joshi, Trupti; Ferreira, Marco A. R.; Stacey, Gary; BioHealth Informatics, School of Informatics and Computing
    Plant growth-promoting bacteria (PGPB) can enhance plant health by facilitating nutrient uptake, nitrogen fixation, protection from pathogens, stress tolerance and/or boosting plant productivity. The genetic determinants that drive the plant–bacteria association remain understudied. To identify genetic loci highly correlated with traits responsive to PGPB, we performed a genome-wide association study (GWAS) using an Arabidopsis thaliana population treated with Azoarcus olearius DQS-4T. Phenotypically, the 305 Arabidopsis accessions tested responded differently to bacterial treatment by improving, inhibiting, or not affecting root system or shoot traits. GWA mapping analysis identified several predicted loci associated with primary root length or root fresh weight. Two statistical analyses were performed to narrow down potential gene candidates followed by haplotype block analysis, resulting in the identification of 11 loci associated with the responsiveness of Arabidopsis root fresh weight to bacterial inoculation. Our results showed considerable variation in the ability of plants to respond to inoculation by A. olearius DQS-4T while revealing considerable complexity regarding statistically associated loci with the growth traits measured. This investigation is a promising starting point for sustainable breeding strategies for future cropping practices that may employ beneficial microbes and/or modifications of the root microbiome.
  • Item
    Deep top-down proteomics revealed significant proteoform-level differences between metastatic and nonmetastatic colorectal cancer cells
    (American Association for the Advancement of Science, 2022) McCool, Elijah N.; Xu, Tian; Chen, Wenrong; Beller, Nicole C.; Nolan, Scott M.; Hummon, Amanda B.; Liu, Xiaowen; Sun, Liangliang; BioHealth Informatics, School of Informatics and Computing
    Understanding cancer metastasis at the proteoform level is crucial for discovering previously unknown protein biomarkers for cancer diagnosis and drug development. We present the first top-down proteomics (TDP) study of a pair of isogenic human nonmetastatic and metastatic colorectal cancer (CRC) cell lines (SW480 and SW620). We identified 23,622 proteoforms of 2332 proteins from the two cell lines, representing nearly fivefold improvement in the number of proteoform identifications (IDs) compared to previous TDP datasets of human cancer cells. We revealed substantial differences between the SW480 and SW620 cell lines regarding proteoform and single amino acid variant (SAAV) profiles. Quantitative TDP unveiled differentially expressed proteoforms between the two cell lines, and the corresponding genes had diversified functions and were closely related to cancer. Our study represents a pivotal advance in TDP toward the characterization of human proteome in a proteoform-specific manner, which will transform basic and translational biomedical research.
  • Item
    Epitranscriptomics in parasitic protists: Role of RNA chemical modifications in posttranscriptional gene regulation
    (Public Library of Science, 2022-12-22) Catacalos, Cassandra; Krohannon, Alexander; Somalraju, Sahiti; Meyer, Kate D.; Janga, Sarath Chandra; Chakrabarti, Kausik; BioHealth Informatics, School of Informatics and Computing
    "Epitranscriptomics" is the new RNA code that represents an ensemble of posttranscriptional RNA chemical modifications, which can precisely coordinate gene expression and biological processes. There are several RNA base modifications, such as N6-methyladenosine (m6A), 5-methylcytosine (m5C), and pseudouridine (Ψ), etc. that play pivotal roles in fine-tuning gene expression in almost all eukaryotes and emerging evidences suggest that parasitic protists are no exception. In this review, we primarily focus on m6A, which is the most abundant epitranscriptomic mark and regulates numerous cellular processes, ranging from nuclear export, mRNA splicing, polyadenylation, stability, and translation. We highlight the universal features of spatiotemporal m6A RNA modifications in eukaryotic phylogeny, their homologs, and unique processes in 3 unicellular parasites-Plasmodium sp., Toxoplasma sp., and Trypanosoma sp. and some technological advances in this rapidly developing research area that can significantly improve our understandings of gene expression regulation in parasites.
  • Item
    Bi-EB: Empirical Bayesian Biclustering for Multi-Omics Data Integration Pattern Identification among Species
    (MDPI, 2022-10-30) Yazdanparast, Aida; Li, Lang; Zhang, Chi; Cheng, Lijun; BioHealth Informatics, School of Informatics and Computing
    Although several biclustering algorithms have been studied, few are used for cross-pattern identification across species using multi-omics data mining. A fast empirical Bayesian biclustering (Bi-EB) algorithm is developed to detect the patterns shared from both integrated omics data and between species. The Bi-EB algorithm addresses the clinical critical translational question using the bioinformatics strategy, which addresses how modules of genotype variation associated with phenotype from cancer cell screening data can be identified and how these findings can be directly translated to a cancer patient subpopulation. Empirical Bayesian probabilistic interpretation and ratio strategy are proposed in Bi-EB for the first time to detect the pairwise regulation patterns among species and variations in multiple omics on a gene level, such as proteins and mRNA. An expectation-maximization (EM) optimal algorithm is used to extract the foreground co-current variations out of its background noise data by adjusting parameters with bicluster membership probability threshold Ac; and the bicluster average probability p. Three simulation experiments and two real biology mRNA and protein data analyses conducted on the well-known Cancer Genomics Atlas (TCGA) and The Cancer Cell Line Encyclopedia (CCLE) verify that the proposed Bi-EB algorithm can significantly improve the clustering recovery and relevance accuracy, outperforming the other seven biclustering methods-Cheng and Church (CC), xMOTIFs, BiMax, Plaid, Spectral, FABIA, and QUBIC-with a recovery score of 0.98 and a relevance score of 0.99. At the same time, the Bi-EB algorithm is used to determine shared the causality patterns of mRNA to the protein between patients and cancer cells in TCGA and CCLE breast cancer. The clinically well-known treatment target protein module estrogen receptor (ER), ER (p118), AR, BCL2, cyclin E1, and IGFBP2 are identified in accordance with their mRNA expression variations in the luminal-like subtype. Ten genes, including CCNB1, CDH1, KDR, RAB25, PRKCA, etc., found which can maintain the high accordance of mRNA-protein for both breast cancer patients and cell lines in basal-like subtypes for the first time. Bi-EB provides a useful biclustering analysis tool to discover the cross patterns hidden both in multiple data matrixes (omics) and species. The implementation of the Bi-EB method in the clinical setting will have a direct impact on administrating translational research based on the cancer cell screening guidance.
  • Item
    Combining transfer learning with retinal lesion features for accurate detection of diabetic retinopathy
    (Frontiers Media, 2022-11-08) Hassan, Doaa; Gill, Hunter Mathias; Happe, Michael; Bhatwadekar, Ashay D.; Hajrasouliha, Amir R.; Janga, Sarath Chandra; BioHealth Informatics, School of Informatics and Computing
    Diabetic retinopathy (DR) is a late microvascular complication of Diabetes Mellitus (DM) that could lead to permanent blindness in patients, without early detection. Although adequate management of DM via regular eye examination can preserve vision in in 98% of the DR cases, DR screening and diagnoses based on clinical lesion features devised by expert clinicians; are costly, time-consuming and not sufficiently accurate. This raises the requirements for Artificial Intelligent (AI) systems which can accurately detect DR automatically and thus preventing DR before affecting vision. Hence, such systems can help clinician experts in certain cases and aid ophthalmologists in rapid diagnoses. To address such requirements, several approaches have been proposed in the literature that use Machine Learning (ML) and Deep Learning (DL) techniques to develop such systems. However, these approaches ignore the highly valuable clinical lesion features that could contribute significantly to the accurate detection of DR. Therefore, in this study we introduce a framework called DR-detector that employs the Extreme Gradient Boosting (XGBoost) ML model trained via the combination of the features extracted by the pretrained convolutional neural networks commonly known as transfer learning (TL) models and the clinical retinal lesion features for accurate detection of DR. The retinal lesion features are extracted via image segmentation technique using the UNET DL model and captures exudates (EXs), microaneurysms (MAs), and hemorrhages (HEMs) that are relevant lesions for DR detection. The feature combination approach implemented in DR-detector has been applied to two common TL models in the literature namely VGG-16 and ResNet-50. We trained the DR-detector model using a training dataset comprising of 1,840 color fundus images collected from e-ophtha, retinal lesions and APTOS 2019 Kaggle datasets of which 920 images are healthy. To validate the DR-detector model, we test the model on external dataset that consists of 81 healthy images collected from High-Resolution Fundus (HRF) dataset and MESSIDOR-2 datasets and 81 images with DR signs collected from Indian Diabetic Retinopathy Image Dataset (IDRID) dataset annotated for DR by expert. The experimental results show that the DR-detector model achieves a testing accuracy of 100% in detecting DR after training it with the combination of ResNet-50 and lesion features and 99.38% accuracy after training it with the combination of VGG-16 and lesion features. More importantly, the results also show a higher contribution of specific lesion features toward the performance of the DR-detector model. For instance, using only the hemorrhages feature to train the model, our model achieves an accuracy of 99.38 in detecting DR, which is higher than the accuracy when training the model with the combination of all lesion features (89%) and equal to the accuracy when training the model with the combination of all lesions and VGG-16 features together. This highlights the possibility of using only the clinical features, such as lesions that are clinically interpretable, to build the next generation of robust artificial intelligence (AI) systems with great clinical interpretability for DR detection. The code of the DR-detector framework is available on GitHub at https://github.com/Janga-Lab/DR-detector and can be readily employed for detecting DR from retinal image datasets.
  • Item
    Computational integration and meta-analysis of abandoned cardio-(vascular/renal/metabolic) therapeutics discontinued during clinical trials from 2011 to 2022
    (Frontiers, 2023-02) Zeng, Carisa; Lee, Yoon Seo; Szatrowski, Austin; Mero, Deniel; Khomtchouk, Bohdan B.; Biohealth Informatics, School of Informatics and Computing
    Cardiovascular/renal/metabolic (CVRM) diseases collectively comprise the leading cause of death worldwide and disproportionally affect older demographics and historically underrepresented minority populations. Despite these critical unmet needs, pharmaceutical research and development (R&D) efforts have historically struggled with high drug failure rates, low approval rates, and other challenges. Drug repurposing is one approach to recovering R&D costs and meeting unmet demands in therapeutic markets. While there are multiple approaches to conducting drug repurposing, we recognize the importance of bringing together and consolidating discontinued drug information to help identify prospective repurposing candidates. In this study, we have harmonized and integrated information on all relevant CVRM drug assets from U.S. Securities and Exchange Commission (SEC) filings, clinical trial records, PharmGKB, Open Targets, and other platforms. A list of existing therapeutics discontinued or shelved by pharmaceutical/biotechnology companies in 2011-2022 were manually curated and interpreted for insights using information on each drug's genetic target, mechanism of action (MOA), clinical indication, and R&D information including highest phase of clinical development, year of discontinuation, previous repurposing attempts (if any), and other actionable metadata. This study also summarizes the profiles of CVRM drugs discontinued within the past decade and identifies the limitations of publicly available information on discontinued drug assets. The constructed database could serve as a tool for identifying candidates for drug repurposing and developing query methods for collecting R&D information.
  • Item
    Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life
    (Nature, 2023-02-06) Hallee, Logan; Khomtchouk, Bohdan B.; Biohealth Informatics, School of Informatics and Computing
    In this study, we investigate how an organism's codon usage bias can serve as a predictor and classifier of various genomic and evolutionary traits across the domains of life. We perform secondary analysis of existing genetic datasets to build several AI/machine learning models. When trained on codon usage patterns of nearly 13,000 organisms, our models accurately predict the organelle of origin and taxonomic identity of nucleotide samples. We extend our analysis to identify the most influential codons for phylogenetic prediction with a custom feature ranking ensemble. Our results suggest that the genetic code can be utilized to train accurate classifiers of taxonomic and phylogenetic features. We then apply this classification framework to open reading frame (ORF) detection. Our statistical model assesses all possible ORFs in a nucleotide sample and rejects or deems them plausible based on the codon usage distribution. Our dataset and analyses are made publicly available on GitHub and the UCI ML Repository to facilitate open-source reproducibility and community engagement.
  • Item
    Targeting long non-coding RNA NUDT6 enhances smooth muscle cell survival and limits vascular disease progression
    (Cold Spring Harbor Laboratory, 2023-06-07) Winter, Hanna; Winski, Greg; Busch, Albert; Chernogubova, Ekaterina; Fasolo, Francesca; Wu, Zhiyuan; Bäcklund, Alexandra; Khomtchouk, Bohdan B.; Van Booven, Derek J.; Sachs, Nadja; Eckstein, Hans-Henning; Wittig, Ilka; Boon, Reinier A.; Jin, Hong; Maegdefessel, Lars; Biohealth Informatics, School of Informatics and Computing
    Long non-coding RNAs (lncRNAs) orchestrate various biological processes and regulate the development of cardiovascular diseases. Their potential therapeutic benefit to tackle disease progression has recently been extensively explored. Our study investigates the role of lncRNA Nudix Hydrolase 6 (NUDT6) and its antisense target fibroblast growth factor 2 (FGF2) in two vascular pathologies: abdominal aortic aneurysms (AAA) and carotid artery disease. Using tissue samples from both diseases, we detected a substantial increase of NUDT6, whereas FGF2 was downregulated. Targeting Nudt6 in vivo with antisense oligonucleotides in three murine and one porcine animal model of carotid artery disease and AAA limited disease progression. Restoration of FGF2 upon Nudt6 knockdown improved vessel wall morphology and fibrous cap stability. Overexpression of NUDT6 in vitro impaired smooth muscle cell (SMC) migration, while limiting their proliferation and augmenting apoptosis. By employing RNA pulldown followed by mass spectrometry as well as RNA immunoprecipitation, we identified Cysteine and Glycine Rich Protein 1 (CSRP1) as another direct NUDT6 interaction partner, regulating cell motility and SMC differentiation. Overall, the present study identifies NUDT6 as a well-conserved antisense transcript of FGF2. NUDT6 silencing triggers SMC survival and migration and could serve as a novel RNA-based therapeutic strategy in vascular diseases.
  • Item
    A Putative long-range RNA-RNA interaction between ORF8 and Spike of SARS-CoV-2
    (Public Library of Science, 2022-09-01) Omoru, Okiemute Beatrice; Pereira, Filipe; Janga, Sarath Chandra; Manzourolajdad, Amirhossein; BioHealth Informatics, School of Informatics and Computing
    SARS-CoV-2 has affected people worldwide as the causative agent of COVID-19. The virus is related to the highly lethal SARS-CoV-1 responsible for the 2002-2003 SARS outbreak in Asia. Research is ongoing to understand why both viruses have different spreading capacities and mortality rates. Like other beta coronaviruses, RNA-RNA interactions occur between different parts of the viral genomic RNA, resulting in discontinuous transcription and production of various sub-genomic RNAs. These sub-genomic RNAs are then translated into other viral proteins. In this work, we performed a comparative analysis for novel long-range RNA-RNA interactions that may involve the Spike region. Comparing in-silico fragment-based predictions between reference sequences of SARS-CoV-1 and SARS-CoV-2 revealed several predictions amongst which a thermodynamically stable long-range RNA-RNA interaction between (23660-23703 Spike) and (28025-28060 ORF8) unique to SARS-CoV-2 was observed. The patterns of sequence variation using data gathered worldwide further supported the predicted stability of the sub-interacting region (23679-23690 Spike) and (28031-28042 ORF8). Such RNA-RNA interactions can potentially impact viral life cycle including sub-genomic RNA production rates.