Treffer: Algorithms for Robust Data Analysis
Weitere Informationen
Data analysis plays an important role in making decisions, making predictions, and helping business operate. Unfortunately, in many situations the data is not reliable and robust analysis is needed to obtain stable results. This may be challenging when the data is high dimensional. Transforming high-dimensional data into low-dimensional data is an important prior step in applications such as managing the data, performing efficient learning, retrieving information, and other data analytics tasks. Feature selection and feature extraction are two classical techniques to achieve dimensionality reduction. Feature selection removes irrelevant and redundant features and keeps only the most important ones. Unlike feature selection, feature extraction generates the features as arbitrary functions of the data. They are typically more accurate than those obtained by feature selection. But the extracted features are harder to interpret than the selected features. Another disadvantage is that feature extraction is less robust than feature selection. In this dissertation we describe algorithms and methods to improve the robustness of feature selection and feature extraction. We propose computing a hybrid low rank representation (HLR) of selected features and extracted features. The robustness of the HLR model comes from the selected features, and its accuracy comes from the extracted features. We develop an algorithm to solve this hybrid problem optimally by combinatorial search. We propose optimal, sub-optimal, and greedy variants of the algorithm to solve this hybrid problem. The sub-optimal and the greedy variants come with the exact bounds on the representation accuracy. Principal Component Analysis (PCA) is a widely used feature extraction algorithm. It is known to be sensitive to outliers that reduce its robustness. Robust Principal Component Analysis (RPCA), also known as Robust Subspace Recovery (RSR), is a classical approach to improve the robustness of the PCA by identifying and removing outliers. We develop a novel ...