Treffer: Modern Analytics over Wide-Tables
Weitere Informationen
Enterprises are eager to leverage tables from diverse sources for decision-making. Recent advancements in cloud data warehouses, offering the abstraction of shared storage with unlimited on-demand computing, provide the ideal architecture for these needs. However, individual analysts often find the sheer number of tables overwhelming. What they prefer instead is a simple Wide-Table, where all relevant tables have been comprehensively integrated into a single table. Such a Wide-Table abstraction, originating from the 1982 concept of the universal relation, simplifies data analysis by allowing analysts to focus on key business metrics and dimensions without the hassle of navigating the tables. To support this abstraction, modern business intelligence tools like PowerBI and Tableau implement "semantic layers" that allow business users to declaratively build Wide-Tables by specifying relationships between tables, defining metrics to compute, and selecting dimensions to group by. Despite their convenience, Wide-Tables are defined as views over joins across tens or even hundreds of underlying tables. Current analytics systems execute these join queries naively on-the-fly, which is notoriously complex and costly in both optimization and execution. Modern analytics requires both interactive query performance for dashboard interactions and the ability to process large-scale data for business intelligence and machine learning applications. The growing demand to incorporate additional data sources, including external and streaming data, further compounds these challenges. This widening gap between current methods and modern requirements necessitates a fundamental redesign of systems to support Wide-Table abstraction effectively. This thesis introduces the Calibrated Junction Hypertree (CJT), a novel data structure that enhances analytics over Wide-Tables. While CJT originated from probabilistic graphical models for efficient inference over joint probabilities (similar to aggregations over joins), it was previously limited ...