Treffer: An exploration of adaptive ensemble approaches in software fault detection: Balancing accuracy and robustness.
Weitere Informationen
This paper explores the application of ensemble methods in predicting software defects using a comprehensive dataset comprising software metrics and defect records. The dataset is divided into three distinct files: 'train.csv', 'test.csv', and 'sample_submission.csv', each serving a specific function within the predictive modeling pipeline. The 'train.csv' file, with 101,763 records, is utilized to train various machine learning models, containing a binary target variable indicating the presence or absence of defects, alongside numerous software metrics. In contrast, the 'test.csv' file, encompassing 67,842 records, is employed to evaluate model performance on unseen data, excluding the target variable. The 'sample_submission.csv' file provides a template for formatting predictions to meet evaluation criteria. The dataset features a mix of numerical and categorical variables, including software performance metrics and attributes such as 'loc', 'v(g)', 'ev(g)', 'branch_count', 'lOCodè, and 'lOComment'. The target variable, 'Defects', is crucial for determining defect presence. Data preprocessing steps are integral to model performance. These include handling null values, applying feature scaling to standardize variables, encoding categorical data using one-hot encoding, and addressing class imbalance through techniques like SMOTE. Ensemble methods, specifically Bagging, Voting, and Stacking, are employed to enhance prediction accuracy and robustness. Bagging reduces variance by training multiple models on bootstrap samples and aggregating their predictions. Voting combines outputs from various classifiers to improve overall performance, while Stacking leverages a meta-model to integrate predictions from multiple base models for optimized results. The study demonstrates that ensemble methods can effectively improve defect prediction, offering valuable insights into software performance and aiding in the development of more reliable software systems. [ABSTRACT FROM AUTHOR]
Copyright of AIP Conference Proceedings is the property of American Institute of Physics and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)