Biostatistics Department Theses and Dissertations

Permanent URI for this collection


Recent Submissions

Now showing 1 - 10 of 56
  • Item
    Bayesian Adaptive Designs for Early Phase Clinical Trials
    (2023-07) Guo, Jiaying; Zang, Yong; Han, Jiali; Zhao, Yi; Ren, Jie
    Delayed toxicity outcomes are common in phase I clinical trials, especially in oncology studies. It causes logistic difficulty, wastes resources, and prolongs the trial duration. We propose the time-to-event 3+3 (T-3+3) design to solve the delayed outcome issue for the 3+3 design. We convert the dose decision rules of the 3+3 design into a series of events. A transparent yet efficient Bayesian probability model is applied to calculate the event happening probabilities in the presence of delayed outcomes, which incorporates the informative pending patients' remaining follow-up time into consideration. The T-3+3 design only models the information for the pending patients and seamlessly reduces to the conventional 3+3 design in the absence of delayed outcomes. We further extend the proposed method to interval 3+3 (i3+3) design, an algorithm-based phase I dose-finding design which is based on simple but more comprehensive rules that account for the variabilities in the observed data. Similarly, the dose escalation/deescalation decision is recommended by comparing the event happening probabilities which are calculated by considering the ratio between the averaged follow-up time for at-risk patients and the total assessment window. We evaluate the operating characteristics of the proposed designs through simulation studies and compare them to existing methods. The umbrella trial is a clinical trial strategy that accommodates the paradigm shift towards personalized medicine, which evaluates multiple investigational drugs in different subgroups of patients with the same disease. A Bayesian adaptive umbrella trial design is proposed to select effective targeted agents for different biomarker-based subgroups of patients. To facilitate treatment evaluation, the design uses a mixture regression model that jointly models short-term and long-term response outcomes. In addition, a data-driven latent class model is employed to adaptively combine subgroups into induced latent classes based on overall data heterogeneities, which improves the statistical power of the umbrella trial. To enhance individual ethics, the design includes a response-adaptive randomization scheme with early stopping rules for futility and superiority. Bayesian posterior probabilities are used to make these decisions. Simulation studies demonstrate that the proposed design outperforms two conventional designs across a range of practical treatment-outcome scenarios.
  • Item
    Sparse Latent-Space Learning for High-Dimensional Data: Extensions and Applications
    (2023-05) White, Alexander James; Cao, Sha; Tu, Wanzhu; Zhang, Chi; Zhao, Yi
    The successful treatment and potential eradication of many complex diseases, such as cancer, begins with elucidating the convoluted mapping of molecular profiles to phenotypical manifestation. Our observed molecular profiles (e.g., genomics, transcriptomics, epigenomics) are often high-dimensional and are collected from patient samples falling into heterogeneous disease subtypes. Interpretable learning from such data calls for sparsity-driven models. This dissertation addresses the high dimensionality, sparsity, and heterogeneity issues when analyzing multiple-omics data, where each method is implemented with a concomitant R package. First, we examine challenges in submatrix identification, which aims to find subgroups of samples that behave similarly across a subset of features. We resolve issues such as two-way sparsity, non-orthogonality, and parameter tuning with an adaptive thresholding procedure on the singular vectors computed via orthogonal iteration. We validate the method with simulation analysis and apply it to an Alzheimer’s disease dataset. The second project focuses on modeling relationships between large, matched datasets. Exploring regressional structures between large data sets can provide insights such as the effect of long-range epigenetic influences on gene expression. We present a high-dimensional version of mixture multivariate regression to detect patient clusters, each with different correlation structures of matched-omics datasets. Results are validated via simulation and applied to matched-omics data sets. In the third project, we introduce a novel approach to modeling spatial transcriptomics (ST) data with a spatially penalized multinomial model of the expression counts. This method solves the low-rank structures of zero-inflated ST data with spatial smoothness constraints. We validate the model using manual cell structure annotations of human brain samples. We then applied this technique to additional ST datasets.
  • Item
    Single-cell Approach to Repurposing of Drugs for Alzheimer’s Disease
    (2023-05) Peyton, Madeline Elizabeth; Johnson, Travis S.; Zhang, Jie; Zhang, Pengyue
    Background: Alzheimer’s disease (AD) is the third leading cause of death for the older demographic in the United States, just after heart disease and cancer. However, unlike heart disease and cancer, the death rates for AD are increasing. Despite extensive research, the cause or origin of AD remains unclear and there is no existing cure. However, with the improvement of single-cell RNA-sequencing (scRNA-seq) technologies and drug repurposing tools, we can further our knowledge of AD and its pathogenesis. Method: Our primary aim was to identify repurposable drug and compound candidates for AD treatment and identify significant cell types and signaling pathways using two scRNA-seq datasets from cortex samples of AD patients and controls. To achieve this aim, we generated differential gene expression profiles, calculated log fold-changes, and estimated standard errors to make pairwise comparisons between the diseased and healthy samples. We used the 21,304 drugs/compounds with response gene expression profiles in 98 cell lines from the LINCS L1000 project to detect consistent differentially expressed genes (DEGs), that were either i) up-regulated in cells of diseased samples and down-regulated in cells with treatment, or ii) down-regulated in cells from diseased samples but up-regulated in cells with treatment. To evaluate these identified drugs, we compared the p-value, false discovery rate (FDR) and A Single-cell Guided Pipeline to Aid Repurposing of Drugs (ASGARD) drug score for each cell type. We further annotated and assessed doublet cell types within the Grubman et al. dataset using cell type proportions. Result: The analysis provided several potential therapeutic treatments for AD and its target genes and pathways as well as important cell type interactions. Notably, we identified an interaction between endothelial cells and microglia, and further identified drug candidates to target this interaction. Conclusion: We identified repurposable drugs/compounds candidates in each dataset which were also identified in literature. We further identified doublet cell type interactions of interest and drugs that target this interaction.
  • Item
    Insights in Response to Statewide COVID-19 Sampling in Indiana
    (2023-05) Shields, David William, Jr.; Yiannoutsos, Constantin; Fadel, William; Bakoyannis, Giorgos
    During 2020, the Indiana State Department of Health conducted a longitudinal study of novel severe acute respiratory syndrome coronavirus 2 (SARS-COV-2) virus, the cause of COVID-19 disease, to understand the number of past and current infections as well as the prevalence of disease in the State of Indiana by conducting a survey to participants as well as administering testing for exposure to SARS-COV-2. The study consisted of 3 waves of testing, each spread months apart, consisting of a random sample and a non-random sample. The non-random sample was used to ensure the sample population was representative of the state of Indiana and was used as stratum in the logistic regression model, allowing for the adjustment for nonresponse. These finding indicate that persons of non-White race and persons of Hispanic ethnicity had highest risk of exposure to the virus. Understanding the disparity in health in various racial and ethnic populations and addressing how different communities are impacted by the pandemic, as well as working with the community is paramount when attempting to mitigate a pandemic. In addition, understanding the data from the ambient pandemic when instituting measures to mitigate the spread of viruses is also extremely important for managing health emergencies such as the COVID-19 pandemic.
  • Item
    Marginal Regression Analysis of Clustered and Incomplete Event History Data
    (2022-12) Zhou, Wenxian; Bakoyannis, Giorgos; Zhang, Ying; Yiannoutsos, Constantin T.; Zang, Yong; Hasan, Mohammad Al
    Event history data, including competing risks and more general multistate process data, are commonly encountered in biomedical studies. In practice, such event history data are often subject to intra-cluster correlation in multicenter studies and are complicated due to informative cluster size, a situation where the outcomes under study are associated with the size of the cluster. In addition, outcomes or covariates are frequently incompletely observed in real-world settings. Ignoring these statistical issues will lead to invalid inferences. In this dissertation, I develop a series of marginal regression methods to address these statistical issues with competing risks and more general multistate process data. The motivation for this research comes from a large multicenter HIV study and a multicenter randomized oncology trial. First, I propose a marginal regression method for clustered competing risks data with missing cause of failure. I consider the semiparametric proportional cause-specific hazards model and propose a maximum partial pseudolikelihood estimator under a plausible missing at random assumption. Second, I consider more general clustered multistate process data and propose a marginal regression framework for the transient state occupation probabilities. The proposed method is based on a weighted functional generalized estimating equation approach. A nonparametric hypothesis test for the covariate effect is also provided. Third, I extend the proposed framework in the second part of the dissertation to account for missing covariates, via a weighted functional pseudo-expected estimating equation approach. I conduct extensive simulation studies to evaluate the finite sample performance of the proposed methods. The proposed methods are applied to the motivating multicenter HIV study and oncology trial datasets.
  • Item
    Group Specific Dynamic Models of Time Varying Exposures on a Time-to-Event Outcome
    (2022-12) Tong, Yan; Gao, Sujuan; Bakoyannis, Giorgos; Tu, Wanzhu; Han, Jiali
    Time-to-event outcomes are widely utilized in medical research. Assessing the cumulative effects of time-varying exposures on time-to-event outcomes poses challenges in statistical modeling. First, exposure status, intensity, or duration may vary over time. Second, exposure effects may be delayed over a latent period, a situation that is not considered in traditional survival models. Third, exposures that occur within a time window may cumulatively in uence an outcome. Fourth, such cumulative exposure effects may be non-linear over exposure latent period. Lastly, exposure-outcome dynamics may differ among groups defined by individuals' characteristics. These challenges have not been adequately addressed in current statistical models. The objective of this dissertation is to provide a novel approach to modeling group-specific dynamics between cumulative timevarying exposures and a time-to-event outcome. A framework of group-specific dynamic models is introduced utilizing functional time-dependent cumulative exposures within an etiologically relevant time window. Penalizedspline time-dependent Cox models are proposed to evaluate group-specific outcome-exposure dynamics through the associations of a time-to-event outcome with functional cumulative exposures and group-by-exposure interactions. Model parameter estimation is achieved by penalized partial likelihood. Hypothesis testing for comparison of group-specific exposure effects is performed by Wald type tests. These models are extended to group-specific non-linear exposure intensity-latency-outcome relationship and group-specific interaction effect from multiple exposures. Extensive simulation studies are conducted and demonstrate satisfactory model performances. The proposed methods are applied to the analyses of group-specific associations between antidepressant use and time to coronary artery disease in a depression-screening cohort using data extracted from electronic medical records.
  • Item
    Innovative Bayesian Designs for Clinical Trials
    (2022-10) He, Tian; Zang, Yong; Liu, Hao; Bakoyannis, Giorgos; Zhao, Yi; Hasan, Mohammad
    Traditional clinical trial designs are generally based on the doctrine of studying one drug for one disease at a time, which may be slow and inefficient. With a high failure rate in drug development, there is a great need to speed up the process of drug development and minimize the cost. Novel trial designs have been proposed, such as the master protocol approach, which has expanded the trial design horizon to umbrella, basket, and platform trials. Compared to traditional clinical protocols, the master protocol enables investigators to evaluate multiple drugs and diverse disease populations simultaneously in a single protocol with the capacity to modify the protocol based on the observed trial data and new drugs. While many statistical methods for trial designs have been proposed for umbrella, basket, and platform trials in the literature, most of the designs are based on a binary or continuous endpoint. However, in the context of oncology trials, there is a great need to develop novel methods for survival endpoints. In this dissertation, we propose three novel Bayesian statistical methods for three distinctive trial design problems, respectively: 1) an optimal Bayesian design for platform trials with multiple endpoints; 2) a novel Bayesian design for basket trials with survival outcomes; 3) an adaptive Bayesian design for seamless phase II/III platform trials with survival endpoints. Extensive simulation studies are performed to evaluate the operating characteristics of the proposed designs under various scenarios.
  • Item
    Association Between Tobacco Related Diagnoses and Alzheimer Disease: A population Study
    (2022-05) Almalki, Amwaj Ghazi; Zhang, Pengyue; Johnson, Travis; Fadel, William
    Background: Tobacco use is associated with an increased risk of developing Alzheimer's disease (AD). 14% of the incidence of AD is associated with various types of tobacco exposure. Additional real-world evidence is warranted to reveal the association between tobacco use and AD in age/gender-specific subpopulations. Method: In this thesis, the relationships between diagnoses related to tobacco use and diagnoses of AD in gender- and age-specific subgroups were investigated, using health information exchange data. The non-parametric Kaplan-Meier method was used to estimate the incidence of AD. Furthermore, the log-rank test was used to compare incidence between individuals with and without tobacco related diagnoses. In addition, we used semi-parametric Cox models to examine the association between tobacco related diagnoses and diagnoses of AD, while adjusting covariates. Results: Tobacco related diagnosis was associated with increased risk of developing AD comparing to no tobacco related diagnosis among individuals aged 60-74 years (female hazard ratio [HR] =1.26, 95% confidence interval [CI]: 1.07 – 1.48, p-value = 0.005; and male HR =1.33, 95% CI: 1.10 - 1.62, p-value =0.004). Tobacco related diagnosis was associated with decreased risk of developing AD comparing to no tobacco related diagnosis among individuals aged 75-100 years (female HR =0.79, 95% CI: 0.70 - 0.89, p-value =0.001; and male HR =0.90, 95% CI: 0.82 - 0.99, p-value =0.023). Conclusion: Individuals with tobacco related diagnoses were associated with an increased risk of developing AD in older adults aged 60-75 years. Among older adults aged 75-100 years, individuals with tobacco related diagnoses were associated with a decreased risk of developing AD.
  • Item
    Evaluation of a Participant Co-designed Lifestyle Change Program for Youth
    (2022-05) Alharbi, Basmah Saleh; Perkins, Susan M.; Hannon, Tamara S.; Daggy, Joanne K.
    Introduction: Increasing obesity in children leads to an increase in the risk of Type 2 diabetes (T2D). Therefore, it is important to promote healthier lifestyles in youths and encourage their caregivers(s) to provide a healthy lifestyle environment. The PowerHouse program focuses on improving food choices, increasing physical activity, and adopting behavior changes for the reduction of obesity and the prevention of T2D. Method: The aim of this study was to assess the effects of implementing the PowerHouse program on both clinical and quality of life outcomes in high-risk, low-income youth and their caregivers. Primary outcomes were BMI standard deviation and BMI percentile in youths. Secondary outcomes included physical activity of youths and quality of life for both youths and their caregivers. Attendance rates were also calculated. Linear effect mixed models were used to test for time effects for all outcomes. Results: Clinical outcomes did not improve over time, except for youth HbA1c (p-value = 0.0447). Some improvements in youth quality-of-life outcomes were noted: specifically, the Sports Index score of the Fels Physical Activity Questionnaire for Children (adjusted p-value = 0.0213) and the Physical Summary (p-value = 0.0407), Psychosocial Summary (p-value = 0.0167), and Total score (p-value = 0.0094) for the youth-reported Pediatric Quality of Life Inventory. Quality of life did not change over time for caregivers. For attendance, there was an improvement after the intervention was modified to improve access to fresh produce (p-value = 0.0002). Conclusion: HbA1c and quality of life improved over time for youth; however, there was not an improvement in caregiver outcomes over time. The data suggest that more time may be needed to see the full effects of the intervention, and/or that a booster intervention may be needed.
  • Item
    Spatial Transcriptomics Analysis Reveals Transcriptomic and Cellular Topology Associations in Breast and Prostate Cancers
    (2022-05) Alsaleh, Lujain; Johnson, Travis S.; Fadel, William; Tu, Wanzhu
    Background: Cancer is the leading cause of death worldwide and as a result is one of the most studied topics in public health. Breast cancer and prostate cancer are the most common cancers among women and men respectively. Gene expression and image features are independently prognostic of patient survival. However, it is sometimes difficult to discern how the molecular profile, e.g., gene expression, of given cells relate to their spatial layout, i.e., topology, in the tumor microenvironment (TME). However, with the advent of spatial transcriptomics (ST) and integrative bioinformatics analysis techniques, we are now able to better understand the TME of common cancers. Method: In this paper, we aim to determine the genes that are correlated with image topology features (ITFs) in common cancers which we denote topology associated genes (TAGs). To achieve this objective, we generate the correlation coefficient between genes and image features after identifying the optimal number of clusters for each of them. Applying this correlation matrix to heatmap using R package pheatmap to visualize the correlation between the two sets. The objective of this study is to identify common themes for the genes correlated with ITFs and we can pursue this using functional enrichment analysis. Moreover, we also find the similarity between gene clusters and some image features clusters using the ranking of correlation coefficient in order to identify, compare and contrast the TAGs across breast and prostate cancer ST slides. Result: The analysis shows that there are groups of gene ontology terms that are common within breast cancer, prostate cancer, and across both cancers. Notably, extracellular matrix (ECM) related terms appeared regularly in all ST slides. Conclusion: We identified TAGs in every ST slide regardless of cancer type. These TAGs were enriched for ontology terms that add context to the ITFs generated from ST cancer slides.