Treffer: Novel AI/ML-based frameworks for protein conformation selection in drug discovery applications

Title:
Novel AI/ML-based frameworks for protein conformation selection in drug discovery applications
Authors:
Source:
Dissertations
Publisher Information:
LOUIS
Publication Year:
2025
Document Type:
Fachzeitschrift text
File Description:
application/pdf
Language:
unknown
Accession Number:
edsbas.341394E6
Database:
BASE

Weitere Informationen

Drug development is a lengthy, expensive process with a high failure rate. The urgency brought on by the COVID-19 pandemic has accelerated the integration of artificial intelligence (AI) and machine learning (ML) to enhance drug discovery by increasing target specificity, reducing toxicity, and optimizing formulation strategies. Building on this momentum, this dissertation introduces novel AI/ML-driven frameworks for protein conformation selection and classification, addressing critical challenges in modern drug discovery. However, designing such a novel AI/ML data-driven framework is a difficult task because most real-world biomedical datasets suffer from class imbalance issues, which can significantly skew AI/ML model training, resulting in biased models and poor prediction accuracy for the minority class. Another issue is the small sample sizes in biomedical datasets, which might complicate drug discovery by misclassifying drug candidate conformations. Hence, to address the aforementioned challenges, this dissertation presents a series of AI/ML data-driven methodologies that have the capability to work with smaller sample sizes suffering from class-imbalance while maintaining model performance. This dissertation presents multiple AI/ML data-driven frameworks aimed at: i) addressing the class imbalance issue in biomedical data, particularly in identifying potential binding protein conformations in a dataset where the non-binding protein conformation outnumbers the binding protein conformations, ii) using data-driven approaches to select probable physio-chemical features of potential binding protein conformations which could aid in identifying unique physio-chemical descriptors that could play a pivotal role in the binding capability of a protein conformation and also help in reducing the dimensionality of the dataset, allowing this work to be carried out on a personal computer rather than a supercomputer, iii) maximizing the prediction accuracy of binding and non-binding protein conformations, and iv) ...