Vom 20.12.2025 bis 11.01.2026 ist die Universitätsbibliothek geschlossen. Ab dem 12.01.2026 gelten wieder die regulären Öffnungszeiten. Ausnahme: Medizinische Hauptbibliothek und Zentralbibliothek sind bereits ab 05.01.2026 wieder geöffnet. Weitere Informationen

Treffer: Finding Interpretable Data Pockets in Tabular Data.

Title:
Finding Interpretable Data Pockets in Tabular Data.
Source:
Statistics & Computing; Dec2025, Vol. 35 Issue 6, p1-21, 21p
Database:
Complementary Index

Weitere Informationen

This paper develops a bump hunting method for discrete-valued tabular data where each bump is modeled by a rectangular region of the input data space so that its rule-based description admits a simple logical interpretation that can be used to make informed decisions. This method is designed to work with labeled data where each input feature has a separate and distinct meaning that may or may not be related to the output, and the goal is to find feature subsets that are related to the output, rectangular regions within these subset feature spaces, and pockets of data within these rectangular regions that simultaneously obey five properties: each rectangle is described by a small subset of input features, the pocket data occupies a local region of the subset-feature space (i.e. the input samples are all similar to one another in the reduced feature space), the input/output relationship for the pocket data is nearly pure (i.e. nearly all output values belong to a designated target set), the number of pocket data samples in each rectangle is both statistically significant and large enough to have relevant meaning for the end application, and the overlap between rectangles is minimal. In contrast to state-of-the-art methods that use decision trees or the PRIM algorithm, this new method is better at distinguishing multiple closely spaced bumps, better at representing non-rectangular shaped bumps that are formed by co-linear features, better at controlling the extent of the rectangles (to provide a simpler interpretation), and more robust against overfitting and the inclusion of spurious features that have no or little relation to the output. [ABSTRACT FROM AUTHOR]

Copyright of Statistics & Computing is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)