Treffer: Weisfeiler-Leman graph kernels for the out-of-distribution characterization of graph structured data
Weitere Informationen
Title from PDF of title page, viewed August 21, 2024 ; Thesis advisor: Yugyung Lee ; Vita ; Includes bibliographical references (pages 78-94) ; Thesis (M.S.)--Department of Computer Science and Electrical Engineering. University of Missouri--Kansas City, 2024 ; This thesis presents a new metric named Graph Distributional Analytics (GDA). This approach uses Weisfeiler-Leman kernels, cosine similarity, and traditional statistical metrics to better characterize graph-structured data. It focuses on enhancing the analysis of graph-structured data and enhancing the explainability and power of Graph Neural Networks (GNNs) without introducing a new model architecture. Within existing GNN research, strong claims of out-of-distribution (OOD) generalizability are frequently made, but these claims fail when exposed to real-world data. We propose existing standards of identifying OOD data are insufficient, and a metric is needed that accurately and efficiently identifies data that is actually different from the training data. Our metric accurately identifies OOD data which allows researchers to make realistic claims about model generalizability. Extensive experiments confirm the effectiveness of this metric through comparative analysis against traditional methods. Our study shows that GDA outperforms existing metrics in detecting OOD instances. This is needed for applications where the generalizability of GNNs is necessary, such as in drug effectiveness studies, protein interaction classification, and complex network systems in telecommunications and social media analysis. The thesis explores how this metric affects the explainability of GNNs, and it reveals the behavior and decision-making processes of these models. This application of GDA in curriculum transfer learning optimizes data usage and computational efficiency. By strategically introducing training data, the models progressively adapt. This improves accuracy and generalization capabilities across various graph-based tasks. This work does not propose a new GNN ...