Treffer: Data Preprocessing and Feature Engineering for Data Mining: Techniques, Tools, and Best Practices.
Weitere Informationen
Data preprocessing and feature engineering play key roles in data mining initiatives, as they have a significant impact on the accuracy, reproducibility, and interpretability of analytical results. This review presents an analysis of state-of-the-art techniques and tools that can be used in data input preparation and data manipulation to be processed by mining tasks in diverse application scenarios. Additionally, basic preprocessing techniques are discussed, including data cleaning, normalisation, and encoding, as well as more sophisticated approaches regarding feature construction, selection, and dimensionality reduction. This work considers manual and automated methods, highlighting their integration in reproducible, large-scale pipelines by leveraging modern libraries. We also discuss assessment methods of preprocessing effects on precision, stability, and bias–variance trade-offs for models, as well as pipeline integrity monitoring, when operating environments vary. We focus on emerging issues regarding scalability, fairness, and interpretability, as well as future directions involving adaptive preprocessing and automation guided by ethically sound design philosophies. This work aims to benefit both professionals and researchers by shedding light on best practices, while acknowledging existing research questions and innovation opportunities. [ABSTRACT FROM AUTHOR]
Copyright of AI is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)