Informatics Graduate Theses and PhD Dissertations

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 10 of 164
  • Item
    Trancriptome-Wide Applications of Protein Occupancy Profile Sequencing (POP-seq)
    (2023-06) Sangani, Neel; Janga, Sarath Chandra; Yan, Jingwen; Srivastava, Mansi
    Dynamic protein-RNA interactions regulate RNA metabolism and alter cellular physiology by altering key regulatory processes such as capping, splicing, polyadenylation, and localization. Several high throughput methods have been developed to detect protein-RNA interactions, but they often exhibit biases due to the inherent limitations of crosslinking-based approaches. We propose Protein Occupancy Profile-Sequencing (POP-seq), a phase separation-based method that does not require crosslinking to detect protein occupancy transcriptome wide. In this study, we employed POP-seq to examine the unbiased regulatory protein-RNA interactions in the following cancer cell lines: K562, HepG2, A549, MCF7, Jurkat, and HEK293. In our preliminary analysis, we performed a comparison of the POP-seq identified interactions using two protocols, one involving UV crosslinking (UPOP-seq) and the other with no-crosslinking (NPOP-seq), in K562 and HepG2 cells. This comparative analysis of two protocol showed >70% overlapping genes detected by both approaches in the two cell lines. Most of these peaks were mapped to intronic regions of the protein coding gene. Concurrently, we also implemented this crosslinking free approach on two leukemia cell lines: Jurkat and K562. Differential analysis shows higher binding activity in Jurkat compared to K562 with majority of the peaks spanned over intronic protein coding region followed by SINE and LINE. Differential proximal binding analysis shows that SE events followed by A3SS events plays a major role in alternative splicing suggesting enriched regions plays vital role in cellular functions including post-transcriptional regulation of gene expression. Motif analysis shows clinically relevant significant motif enrichment of POP-seq identified peaks. This study was further expanded by adding three human additional cell lines: MCF7, A459, and HEK293. Differential peak analysis across cell lines revealed a closer association between A549 and MCF7 cells based on the normalized POP-seq peaks per gene. We observed that genes associated with differential peaks between cell lines exhibited enrichment for crucial cellular functions, particularly in the post-transcriptional regulation of gene expression. Our analysis unveiled a notable enrichment of specific motifs within the identified peaks obtained from POP-seq. These overrepresented motifs were significantly linked to somatic variation, phenotypic variation (Phenvar), clinical variation (Clinvar), GWAS, and allele-specific expression (ASE), with a preferential abundance of the motifs on the C and G bases. Additionally, our alternative splicing analysis revealed that POP-seq detected protein-RNA interactions that substantially contributed to splicing events in certain cell line pairs, while their impact was less pronounced in others. Overall, our study offers the first extensive dataset of protein-RNA interaction maps across the transcriptome in multiple cell lines, utilizing a crosslinking-free approach. This valuable resource not only provides comprehensive insights into regulatory interactions but also opens new possibilities for applying this method in primary tissues to detect and study protein-RNA interactions in a broader biological context.
  • Item
    Computational Methods for Determining RNA-RNA Interactions
    (2023-06) Schaeper, David; Janga, Sarath Chandra; Yan, Jingwen; Srivastava, Mansi
    RNA molecules play vital roles in both viruses and cells, and one way to study their function is through the RNA-RNA interactions (RRIs) that occur. RRIs form in one of two ways, through protein mediated RRIs, where a protein brings the RNA molecules together, or through direct complimentary base pairing between the molecules, called RNA centric. Protein mediated RRIs have been captured and analyzed through experimental protocols such as cross-linking ligation and sequencing of hybrids (CLASH) and mapping RNA interactome in vivo (MARIO). RNA centric interactions have been investigated through experimental protocols ligation of interacting RNA followed by high-throughput sequencing (LIGR-seq), sequencing of psoralen crosslinked, ligated, selected hybrids (SPLASH), psoralen analysis of RNA interactions and structures (PARIS), and cross-linking of matched RNAs and deep sequencing (COMRADES). There are also tools that have been developed to predict RRIs and the predominant tools, RNAup and IntaRNA, utilize minimum free energy (MFE) calculations. In this work, initially RRIs were studied in the context of SARS-CoV-2 and its variants to observe evolutionary changes to RRIs. Using in silico RRIs generated through the COMRADES protocol by Ziv et al alongside computational predictions generated through IntaRNA and a large population of SARS-CoV-2 sequences, covariation analysis was used on the population stratified by variants to determine variant-specific evolutionary changes for certain long-range RRIs. Also, statistical evidence was found for a novel Beta variant specific RNA-RNA interaction. After this, RRIs were studied in the human HEK293T cell line through a novel experimental protocol using Oxford Nanopore long-read sequencing technology to be able to capture more complete information on RRIs mapped with the newly developed pipeline Alignment of Chimera through Clustering and Read Splitting (ACCRES). Through this, multi-molecule RNA interactions were able to be detected using an iterative BLAST approach, which is the first time these have been reported to our knowledge. Interaction interfaces were quantified, and the interactions were characterized by their biotype to understand the landscape of these interactions in the cell line. A network was built, and functional enrichment performed to show the interplay between known functions in the cell.
  • Item
    Family Resiliance Technologies: Designing Collaborative Technologies for Caregiving Coordination in the Children's Hospital
    (2023-03) Nikkhah, Sarah; Miller, Andrew D.; Bolchini, Davide; Martin-Hammond, Aqueasha M.; Murillo, Angela P.; Pratt, Wanda
    Each year, the parents of approximately 15,300 kids will hear the words “Your child has cancer.” Families with hospitalized children must process a lot of stress and play a vital role in their child’s care. Hospitalized children need care and assistance processing medical information and going through their treatment. Therefore, their families must take on new responsibilities such as providing care, processing medical information, getting ready for the extensive and sometimes painful treatments, and facing the fear of losing their child. They must also adjust their daily duties, chores, and jobs to provide care to their hospitalized child. Previous research on families with hospitalized children shows that a lower level of stress and a higher level of communication among family members are significant predictors of long-term health outcomes after hospitalization. Social work and family therapy studies researched family resilience as these families’ ability to process and handle stress as a system. However, few technologies are designed to increase family resilience and support the family’s communication and collaboration when a child is hospitalized. My aim in this dissertation is to understand how collaborative technologies can help family members of hospitalized children (family caregivers) collaborate and coordinate with each other during the stressful extended hospitalization period. Through qualitative interviews and elicitation activities followed by iterative cycles of design, I showed that Family Resilience can be used as a lens to understand families’ collaborative processes and guide the design of collaborative technologies to support these families in adapting when they are under stress, and their usual routines as a family are constantly changing due to their child’s hospitalization. Therefore, there is an opportunity for HCI and CSCW to design collaborative technology that supports family resilience processes for families facing a crisis, such as having a hospitalized child with cancer.
  • Item
    Develop the Disease Specific Bioinformatics Platforms with Integrated Bioinformatics Data
    (2022-11) Liu, Jiannan; Yan, Jingwen; Zhang, Jie; Huang, Kun; Zhang, Chi; Richardson, Timothy I.; Wu, Huanmei
    With the advance of multiple types of omics technology and corresponding analytical methods, various type of bioinformatic data have become available. Mining and integrating these data for analysis will provide valuable insights for disease mechanism investigation, drug target identification and new drug development. However, most of the omics data are large size, heterogeneous, and complex, it is challenging for biomedical researchers to mine the data for relevant evidence, especially for those with limited computational skills. In this thesis, I aimed to develop disease specific platforms integrated with multimodal bioinformatic data types to provide researchers with strong bioinformatics support. To achieve this goal, I explored advanced transcriptomic data analytical methods and proposed a novel biomarker for the prediction of overall survival of colon cancer patients, then prototyped a user-friendly patient oriented clinical decision support system to provide accurate and intuitive colorectal cancer risk factor assessment. With the experience of the transcriptomic data analytical methods and the web-based application development, I further designed and implemented Cancer Gene and Pathway Explorer which is an integrative bioinformatics webserver that can be used for cancer publication trends investigation, gene set enrichment analysis with integrated data, and optimal cancer cell line identification. Based on the framework of CGPE, I developed another bioinformatics platform focusing on Alzheimer’s disease, called Alzheimer’s Disease Explorer, which is a first-of-its-kind bioinformatics server, providing rich bioinformatic support from literature, omics and chemical data to facilitate researchers in ND drug development field. By accomplishing a series of work in my thesis, I have shown that integrated disease specific bioinformatics platforms can provide great value to the research community by allowing 1.) fast and accurate investigation of currently available literature, 2.) quick hypothesis generation and validation using transcriptomic datasets, 3.) multi-dimension drug target evaluation and 4) fast querying of published bioinformatics outcomes.
  • Item
    Integrated Correlation Analysis of Proteomics and Transcriptomics Data in Alzheimer's Disease
    (2020-12) Modekurty, Suneeta; Liu, Xiaowen; Wan, Jun; Zheng, Jiaping
    We wanted to see if there existed any significant correlations between two -omics layers. So, here, we performed a correlation analysis to study the disease. The pipeline building consisted of first performing the differential expression of two datasets (proteomics and transcriptomics) individually. An in-depth analysis of the proteomics data was performed, followed by differential expression analysis of RNA seq data and then a correlational analysis of the differentially expressed proteins (from proteomics data) and genes (from RNA seq data). From our analysis, we found fascinating information about the correlations between proteins and genes in AD. We performed a correlation analysis of AD (N= 84), Control (N = 31), and PSP (N = 85) samples for proteomics data and got 114 differentially expressed proteins (DEPs = 114). The RNA seq data had AD (N = 82), Control (N = 31) and PSP (N = 84) samples which gave us 61 differentially expressed genes (DEGs = 61). A correlation analysis using Spearman’s correlation coefficient method between proteins involved in AD revealed 192 very significant correlations with p-value <= 0.00000000000005. The mean correlation coefficient was quite high (r = 0.52). A correlation analysis using Spearman’s correlation coefficient method between genes involved in AD revealed 208 very significant correlations with p-value <= 0.00000000000005. The mean correlation coefficient was quite high (r = 0.52). A correlation analysis using Spearman’s correlation coefficient method between proteins and genes involved in AD revealed 395 significant correlations with p-value <= 0.0001. The correlation coefficient (quite high of +0.53), which might help in understanding the molecular pathways behind the disease could uncover new prospects of understanding the disease as well as design treatments. We observed that different genes interact with different proteins (correlation coefficient r >= 0.5, p-value < 0.05). We also observed that a single protein interacts with multiple genes, and a single gene is interestingly associated with multiple proteins. The patterns of correlations are also different in that a protein/gene positively correlates with some proteins/genes and negatively with some other proteins/genes. We hope that this observation is quite useful. However, understanding how it works and how they interact with each other needs further assessment at the molecular level.
  • Item
    Bridging The Gap Between Healthcare Providers and Consumers: Extracting Features from Online Health Forum to Meet Social Needs of Patients using Network Analysis and Embedding
    (2020-08) Mokashi, Maitreyi; Chakraborty, Sunandan; Jones, Josette; Zheng, Jiaping
    Chronic disease patients have to face many issues during and after their treatment. A lot of these issues are either personal, professional, or social in nature. It may so happen that these issues are overlooked by the respective healthcare providers and become major obstacles in the patient’s day-to-day life and their disease management. We extract data from an online health platform that serves as a ‘safe haven’ to the patients and survivors to discuss help and coping issues. This thesis presents a novel approach that acts as the first step to include the social issues discussed by patients on online health forums which the healthcare providers need to consider in order to create holistic treatment plans. There are numerous online forums where patients share their experiences and post questions about their treatments and their subsequent side effects. We collected data from an “Online Breast Cancer Forum”. On this forum, users (patients) have created threads across many related topics and shared their experiences and questions. We connect the patients (users) with the topic in which they have posted by converting the data into a bipartite network and turn the network nodes into a high-dimensional feature space. From this feature space, we perform community detection on the node embeddings to unearth latent connections between patients and topics. We claim that these latent connections, along with the existing ones, will help to create a new knowledge base that will eventually help the healthcare providers to understand and acknowledge the non-medical related issues to a treatment, and create more adaptive and personalized plans. We performed both qualitative and quantitative analysis on the obtained embeddings to prove the superior quality of our approach and its potential to extract more information when compared to other models.
  • Item
    Data-Driven Accountability: Examining and Reorienting the Mythologies of Data
    (2020-05) Verma, Nitya; Dombrowski, Lynn; Bolchini, Davide; Young, Alyson; Seybold, Peter; Voida, Amy; Muller, Michael
    In this work, I examine and design sociotechnical interventions for addressing limitations around data-driven accountability, particularly focusing on politically contentious and systemic social issues (i.e., police accountability). While organizations across sectors of society are scrambling to adopt data-driven technologies and practices, there are epistemological and ethical concerns around how data use influences decisionmaking and actionability. My work explores how stakeholders adopt and handle the challenges around being data-driven, advocating for ways HCI can mitigate such challenges. In this dissertation, I highlight three case studies that focus on data-driven, human-services organizations, which work with at-risk and marginalized populations. First, I examine the tools and practices of nonprofit workers and how they experience the mythologies associated with data use in their work. Second, I investigate how police officers are adopting data-driven technologies and practices, which highlights the challenges police contend with in addressing social criticisms around police accountability and marginalization. Finally, I conducted a case study with multiple stakeholders around police accountability to understand how systemic biases and politically charged spaces perceive and utilize data, as well as to develop the design space around how alternative futures of being data-driven could support more robust and inclusive accountability. I examine how participants situate the concepts of power, bias, and truth in the data-driven practices and technologies used by and around the police. With this empirical work, I present insights that inform the HCI community at the intersection of data design, practice, and policies in addressing systemic social issues.
  • Item
    Exploring The Effect Of Visual And Verbal Feedback On Ballet Dance Performance In Mirrored And Non-Mirrored Environments
    (2016-05) Trajkova, Milka; Cafaro, Francesco; Bolchini, Davide; Mannheimer, Steve
    Since the 1800s, the ballet studio has been largely unchanged, a core feature of which is the mirror. The influence of mirrors on ballet education has been documented, and prior literature has shown negative effects on dancers’ body image, satisfaction, level of attention and performance quality. While the mirror provides immediate real-time feedback, it does not inform dancers of their errors. Tools have been developed to do so, but the design of the feedback from a bottom-up perspective has not been extensively studied. The following study aimed to assess the value of different types of feedback to inform the design of tech-augmented mirrors. University students’ ballet technique scores were evaluated on eight ballet combinations (tendue, adagio, pirouette, petit allegro, plié, degage, frappe and battement tendue), and feedback was provided to them. We accessed learning with remote domain expert to determine whether or not the system had an impact on dancers. Results revealed that the treatment with feedback was statistically significant and yielded higher performance versus without the feedback. Mirror versus non-mirror performance did not present any score disparity indicating that users performed similarly in both conditions. A best fit possibility was seen when visual and verbal feedback were combined. We created MuscAt, a set of interconnected feedback design principles, which led us to conclude that the feasibility of remote teaching in ballet is possible.
  • Item
    End-User Needs of Fragmented Databases in Higher Education Data Analysis and Decision Making
    (2019-05) Briggs, Amanda; Cafaro, Francesco; Dombrowski, Lynn; Reda, Khairi
    In higher education, a wealth of data is available to advisors, recruiters, marketers, and program directors. However, data sources can be accessed in a variety of ways and often do not seem to represent the same data set, presenting users with the confounding notion that data sources are in conflict with one another. As users are identifying new ways of accessing and analyzing this data, they are modifying existing work practices and sometimes creating their own databases. To understand how users are navigating these databases, the researchers employed a mixed methods research design including a survey and interview to understand the needs to end users who are accessing these seemingly fragmented databases. The study resulted in a three overarching categories – access, understandability, and use – that affect work practices for end users. The researchers used these themes to develop a set of broadly applicable design recommendations as well as six sets of sketches for implementation – development of a data gateway, training, collaboration, tracking, definitions and roadblocks, and time management.
  • Item
    Explore the relations between personality and gamification
    (2018-01-22) Jia, Yuan; Bolchini, Davide; Voida, Stephen; MacDorman, Karl; Defazio, Joseph
    Successful gamification motivates users to engage in systems using game-like experiences. However, a one-size-fits-all approach to gamification is often unsuccessful; prior studies suggest that personality serves as a key differentiator in the effectiveness of the approach. To advance the understanding of personality differences and their influence on users’ behavior and motivation in gamification, this dissertation is comprised of three studies that: 1) explore the relationships among individuals’ personality traits and preferences for different gamification features through an online survey; 2) investigate how people with different personality traits respond to the motivational affordances in a gamified application over a period of time through a diary study; and 3) reveal how individuals respond differentially to different kinds of leaderboard experiences based on their leaderboard rankings, the application domain, and the individuals’ personality traits through their responses to 9 dynamic leaderboards. The results from the first study show that extraversion and emotional stability are the two primary personality traits that differentiate users’ preferences for gamification. Among the 10 types of motivational affordances, extraverts are more likely to be motivated by Points, Levels, and Leaderboards. However, the results from the second (diary) study indicate that, after the first week, extraverts’ preferences for Points decreased. The motivation effects of Points and Leaderboards changed over the course of using the gamified application. The results from the third study confirm the findings from the first two studies about extraversion and revealed that ranking and domain differences are also effective factors in users’ experiences of Leaderboards in gamification. Design guidelines for gamification are presented based on the results of each of the three studies. Based on a synthesis of the results from these three studies, this dissertation proposes a conceptual model for gamification design. The model describes not only the impact of personality traits, domain differences, and users’ experience over time, but also illustrates the importance of considering individual differences, application context, and the potential significance of user persistence in gamification design. This research contributes to the HCI and gamification communities by uncovering factors that will affect the way that people respond to gamification systems, considered holistically.