Treffer: Investigation and Research on Several Key Issues of Software Defect Prediction.
Weitere Informationen
With the increasing size and complexity of software code, hidden defects can pose serious problems to systems, making zero‐defect software an urgent need for current industrial software applications. Software defect prediction (SDP) serves to identify defective modules or classes, with prediction models trained using historical defect data from various projects. This enables defect prediction in test projects, aiding in the rational allocation of test resources and the enhancement of software quality. The efficacy of SDP closely hinges on the quality of the defect dataset, the selected metric index, the trained model, and the algorithm design. This article reviews recent literature on SDP, summarizing existing research from three key perspectives: the dataset and metric elements employed in SDP, dataset optimization processing techniques, and defect prediction model techniques. It primarily focuses on introducing commonly used datasets and two types of defect metrics for SDP. Regarding dataset optimization processing technology, it discusses methods for handling abnormal data, high‐dimensional data, class imbalance data, and data disparity issues. Furthermore, it analyzes the construction of prediction models across four dimensions: supervised learning, semi‐supervised learning, unsupervised learning, and deep learning (DL). Key observations include: (i) Researchers utilize datasets of varying quality, performance evaluation metrics, and SDP models. The efficacy of software product metrics and development process metrics varies across different application scenarios, necessitating flexible metric selection based on actual requirements. (ii) Commonly used datasets like Promise and NASA exhibit varying data quality. Appropriate data preprocessing methods and dataset creation are crucial before training SDP models. (iii) In scenarios with limited labeled data, cross‐project transfer learning, semi‐supervised, or unsupervised learning methods tend to better utilize a broader range of training data. Given that each step in the SDP process corresponds to different unresolved issues, each requiring varying levels of response measures, we suggest that researchers comprehensively consider research objectives such as dataset quality, SDP model, performance evaluation indicators, and the need for model interpretability when conducting SDP‐related research. It's important to note that no universal dataset or model can perform optimally across different application scenarios. [ABSTRACT FROM AUTHOR]
Copyright of IET Software (Wiley-Blackwell) is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)