Department of Computer and Information Science Works

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 10 of 254
  • Item
    Deperturbation of Online Social Networks via Bayesian Label Transition
    (Society for Industrial and Applied Mathematics, 2022) Zhuang, Jun; Al Hasan, Mohammad; Computer and Information Science, School of Science
    Online social networks (OSNs) classify users into different categories based on their online activities and interests, a task which is referred as a node classification task. Such a task can be solved effectively using Graph Convolutional Networks (GCNs). However, a small number of users, so-called perturbators, may perform random activities on an OSN, which significantly deteriorate the performance of a GCN-based node classification task. Existing works in this direction defend GCNs either by adversarial training or by identifying the attacker nodes followed by their removal. However, both of these approaches require that the attack patterns or attacker nodes be identified first, which is difficult in the scenario when the number of perturbator nodes is very small. In this work, we develop a GCN defense model, namely GraphLT, which uses the concept of label transition. GraphLT assumes that perturbators' random activities deteriorate GCN's performance. To overcome this issue, GraphLT subsequently uses a novel Bayesian label transition model, which takes GCN's predicted labels and applies label transitions by Gibbs-sampling-based inference and thus repairs GCN's prediction to achieve better node classification. Extensive experiments on seven benchmark datasets show that GraphLT considerably enhances the performance of the node classifier in an unperturbed environment; furthermore, it validates that GraphLT can successfully repair a GCN-based node classifier with superior performance than several competing methods.
  • Item
    Defending Graph Convolutional Networks against Dynamic Graph Perturbations via Bayesian Self-Supervision
    (AAAI Technical Track, 2022-06-28) Zhuang, Jun; Al Hasan, Mohammad; Computer and Information Science, School of Science
    In recent years, plentiful evidence illustrates that Graph Convolutional Networks (GCNs) achieve extraordinary accomplishments on the node classification task. However, GCNs may be vulnerable to adversarial attacks on label-scarce dynamic graphs. Many existing works aim to strengthen the robustness of GCNs; for instance, adversarial training is used to shield GCNs against malicious perturbations. However, these works fail on dynamic graphs for which label scarcity is a pressing issue. To overcome label scarcity, self-training attempts to iteratively assign pseudo-labels to highly confident unlabeled nodes but such attempts may suffer serious degradation under dynamic graph perturbations. In this paper, we generalize noisy supervision as a kind of self-supervised learning method and then propose a novel Bayesian self-supervision model, namely GraphSS, to address the issue. Extensive experiments demonstrate that GraphSS can not only affirmatively alert the perturbations on dynamic graphs but also effectively recover the prediction of a node classifier when the graph is under such perturbations. These two advantages prove to be generalized over three classic GCNs across five public graph datasets.
  • Item
    Batch Discovery of Recurring Rare Classes toward Identifying Anomalous Samples
    (ACM, 2014) Dundar, Murat; Yerebakan, Halid Ziya; Rajwa, Bartek; Computer and Information Science, School of Science
    We present a clustering algorithm for discovering rare yet significant recurring classes across a batch of samples in the presence of random effects. We model each sample data by an infinite mixture of Dirichlet-process Gaussian-mixture models (DPMs) with each DPM representing the noisy realization of its corresponding class distribution in a given sample. We introduce dependencies across multiple samples by placing a global Dirichlet process prior over individual DPMs. This hierarchical prior introduces a sharing mechanism across samples and allows for identifying local realizations of classes across samples. We use collapsed Gibbs sampler for inference to recover local DPMs and identify their class associations. We demonstrate the utility of the proposed algorithm, processing a flow cytometry data set containing two extremely rare cell populations, and report results that significantly outperform competing techniques. The source code of the proposed algorithm is available on the web via the link: http://cs.iupui.edu/~dundar/aspire.htm.
  • Item
    Anomaly Detection and Inter-Sensor Transfer Learning on Smart Manufacturing Datasets
    (MDPI, 2023-01-02) Abdallah, Mustafa; Joung, Byung-Gun; Lee, Wo Jae; Mousoulis, Charilaos; Raghunathan, Nithin; Shakouri, Ali; Sutherland, John W.; Bagchi, Saurabh; Computer and Information Science, School of Science
    Smart manufacturing systems are considered the next generation of manufacturing applications. One important goal of the smart manufacturing system is to rapidly detect and anticipate failures to reduce maintenance cost and minimize machine downtime. This often boils down to detecting anomalies within the sensor data acquired from the system which has different characteristics with respect to the operating point of the environment or machines, such as, the RPM of the motor. In this paper, we analyze four datasets from sensors deployed in manufacturing testbeds. We detect the level of defect for each sensor data leveraging deep learning techniques. We also evaluate the performance of several traditional and ML-based forecasting models for predicting the time series of sensor data. We show that careful selection of training data by aggregating multiple predictive RPM values is beneficial. Then, considering the sparse data from one kind of sensor, we perform transfer learning from a high data rate sensor to perform defect type classification. We release our manufacturing database corpus (4 datasets) and codes for anomaly detection and defect type classification for the community to build on it. Taken together, we show that predictive failure classification can be achieved, paving the way for predictive maintenance.
  • Item
    Open data and model integration through generic model agent toolkit in CyberWater framework
    (Elsevier, 2022-06) Chen, Ranran; Luna, Daniel; Cao, Yuan; Liang, Yao; Liang, Xu; Computer and Information Science, School of Science
    The CyberWater project is created to develop an open data and open model integration framework for studying complex environmental and water problems, where diverse online data sources can be directly accessed by diverse models without any need of users’ extra effort on the tedious tasks of data preparation for their models. We present our design and development of a novel generic model agent toolkit in the context of CyberWater, which enables users to integrate their models into the CyberWater system without writing any new code, significantly simplifying the data and model integration task. CyberWater adopts a visual scientific workflow system, VisTrails, which also supports provenance and reproducible computing. Our approach and the developed generic model agent toolkit are demonstrated, via CyberWater framework, with automated and flexible workflows through integrating data and models using real-world use cases. Two popular hydrological models, VIC and DHSVM, are used for illustrations.
  • Item
    Energy-efficient and balanced routing in low-power wireless sensor networks for data collection
    (Elsevier, 2022-03-15) Navarro, Miguel; Liang, Yao; Zhong, Xiaoyang; Computer and Information Science, School of Science
    Cost-based routing protocols are the main approach used in practical wireless sensor network (WSN) and Internet of Things (IoT) deployments for data collection applications with energy constraints; however, those routing protocols lead to the concentration of most of the data traffic on some specific nodes which provide the best available routes, thus significantly increasing their energy consumption. Consequently, nodes providing the best routes are potentially the first ones to deplete their batteries and stop working. In this paper, we introduce a novel routing strategy for energy efficient and balanced data collection in WSNs/IoT, which can be applied to any cost-based routing solution to exploit suboptimal network routing alternatives based on the parent set concept. While still taking advantage of the stable routing topologies built in cost-based routing protocols, our approach adds a random component into the process of packet forwarding to achieve a better network lifetime in WSNs. We evaluate the implementation of our approach against other state-of-the-art WSN routing protocols through thorough real-world testbed experiments and simulations, and demonstrate that our approach achieves a significant reduction in the energy consumption of the routing layer in the busiest nodes ranging from 11% to 59%, while maintaining over 99% reliability. Furthermore, we conduct the field deployment of our approach in a heterogeneous WSN for environmental monitoring in a forest area, report the experimental results and illustrate the effectiveness of our approach in detail. Our EER based routing protocol CTP+EER is made available as open source to the community for evaluation and adoption.
  • Item
    Analysis of AI Models for Student Admissions: A Case Study
    (ACM, 2023-03) Van Basum, Kelly; Fang, Shaiofen; Computer and Information Science, School of Science
    This research uses machine learning-based AI models to predict admissions decisions at a large urban research university. Admissions data spanning five years was used to create an AI model to determine whether a given student would be directly admitted into the School of Science under various scenarios. During this time, submission of standardized test scores as part of a student's application became optional which led to interesting questions about the impact of standardized test scores on admission decisions. We first developed AI models and analyzed these models to understand which variables are important in admissions decisions, and how the decision to exclude test scores affects the demographics of the students who are admitted. We then evaluated the predictive models to detect and analyze biases these models may carry with respect to three variables chosen to represent sensitive populations: gender, race, and whether a student was the first in his family to attend college.
  • Item
    Trustability for Resilient Internet of Things Services on 5G Multiple Access Edge Cloud Computing
    (MDPI, 2022-12-16) Uslu, Suleyman; Kaur, Davinder; Durresi, Mimoza; Durresi, Arjan; Computer and Information Science, School of Science
    Billions of Internet of Things (IoT) devices and sensors are expected to be supported by fifth-generation (5G) wireless cellular networks. This highly connected structure is predicted to attract different and unseen types of attacks on devices, sensors, and networks that require advanced mitigation strategies and the active monitoring of the system components. Therefore, a paradigm shift is needed, from traditional prevention and detection approaches toward resilience. This study proposes a trust-based defense framework to ensure resilient IoT services on 5G multi-access edge computing (MEC) systems. This defense framework is based on the trustability metric, which is an extension of the concept of reliability and measures how much a system can be trusted to keep a given level of performance under a specific successful attack vector. Furthermore, trustability is used as a trade-off with system cost to measure the net utility of the system. Systems using multiple sensors with different levels of redundancy were tested, and the framework was shown to measure the trustability of the entire system. Furthermore, different types of attacks were simulated on an edge cloud with multiple nodes, and the trustability was compared to the capabilities of dynamic node addition for the redundancy and removal of untrusted nodes. Finally, the defense framework measured the net utility of the service, comparing the two types of edge clouds with and without the node deactivation capability. Overall, the proposed defense framework based on trustability ensures a satisfactory level of resilience for IoT on 5G MEC systems, which serves as a trade-off with an accepted cost of redundant resources under various attacks.
  • Item
    Tissue Cytometry With Machine Learning in Kidney: From Small Specimens to Big Data
    (Frontiers, 2022) El-Achkar, Tarek M.; Winfree, Seth; Talukder, Niloy; Barwinska, Daria; Ferkowicz, Michael J.; Al Hasan, Mohammad; Computer and Information Science, School of Science
    Advances in cellular and molecular interrogation of kidney tissue have ushered a new era of understanding the pathogenesis of kidney disease and potentially identifying molecular targets for therapeutic intervention. Classifying cells and identifying subtypes and states induced by injury is a foundational task in this context. High resolution Imaging-based approaches such as large-scale fluorescence 3D imaging offer significant advantages because they allow preservation of tissue architecture and provide a definition of the spatial context of each cell. We recently described the Volumetric Tissue Exploration and Analysis cytometry tool which enables an interactive analysis, quantitation and semiautomated classification of labeled cells in 3D image volumes. We also established and demonstrated an imaging-based classification using deep learning of cells in intact tissue using 3D nuclear staining with 4',6-diamidino-2-phenylindole (DAPI). In this mini-review, we will discuss recent advancements in analyzing 3D imaging of kidney tissue, and how combining machine learning with cytometry is a powerful approach to leverage the depth of content provided by high resolution imaging into a highly informative analytical output. Therefore, imaging a small tissue specimen will yield big scale data that will enable cell classification in a spatial context and provide novel insights on pathological changes induced by kidney disease.
  • Item
    Exploration and Visualization of Patterns Underlying Multistakeholder Preferences in Watershed Conservation Decisions Generated by an Interactive Genetic Algorithm
    (Wiley, 2021-05) Piemonti, Adriana Debora; Guizani, Mariam; Babbar-Sebens, Meghna; Zhang, Eugene; Mukhopadhyay, Snehasis; Computer and Information Science, School of Science
    In multiple watershed planning and design problems, such as conservation planning, quantitative estimates of costs, and environmental benefits of proposed conservation decisions may not be the only criteria that influence stakeholders' preferences for those decisions. Their preferences may also be influenced by the conservation decision itself—specifically, the type of practice, where it is being proposed, existing biases, and previous experiences with the practice. While human-in-the-loop type search techniques, such as Interactive Genetic Algorithms (IGA), provide opportunities for stakeholders to incorporate their preferences in the design of alternatives, examination of user-preferred conservation design alternatives for patterns in Decision Space can provide insights into which local decisions have higher or lower agreement among stakeholders. In this paper, we explore and compare spatial patterns in conservation decisions (specifically involving cover crops and filter strips) within design alternatives generated by IGA and noninteractive GA. Methods for comparing patterns include nonvisual as well as visualization approaches, including a novel visual analytics technique. Results for the study site show that user-preferred designs generated by all participants had strong bias for cover crops in a majority (50%–83%) of the subbasins. Further, exploration with heat maps visualization indicate that IGA-based search yielded very different spatial patterns of user-preferred decisions in subbasins in comparison to decisions within design alternatives that were generated without the human-in-the-loop. Finally, the proposed coincident-nodes, multiedge graph visualization was helpful in visualizing disagreement among participants in local subbasin scale decisions, and for visualizing spatial patterns in local subbasin scale costs and benefits.