Electra：基于条件生成模型的谓词感知查询近似

论文标题

Electra：基于条件生成模型的谓词感知查询近似

Electra: Conditional Generative Model based Predicate-Aware Query Approximation

论文作者

Sheoran, Nikhil, Mitra, Subrata, Porwal, Vibhor, Ghetia, Siddharth, Varshney, Jatin, Mai, Tung, Rao, Anup, Maddukuri, Vikas

论文摘要

近似查询处理（AQP）的目的是提供非常快速但“足够准确”的结果，以实现昂贵的聚合查询，从而改善了对大型数据集的交互式探索的用户体验。最近提出的基于机器学习的AQP技术可以提供非常低的延迟，因为与数据库簇上的传统查询处理相比，查询执行仅涉及模型推断。但是，随着过滤谓词的数量增加（其中条款），这些方法的近似误差显着增加。分析师经常使用大量谓词来发现的查询。因此，保持低近似误差对于防止分析师得出误导性结论很重要。在本文中，我们提出了Electra，这是一种谓词感知的AQP系统，可以回答大量谓词，近似误差要小得多。 Electra使用有条件的生成模型，该模型了解数据的条件分布，并在运行时生成一个小（约1000行），但代表性样本，在该样本上执行查询以计算近似结果。我们在三个现实世界数据集上使用四个不同基线的评估表明，与基层相比，Electra为大量谓词提供了较低的AQP误差。

The goal of Approximate Query Processing (AQP) is to provide very fast but "accurate enough" results for costly aggregate queries thereby improving user experience in interactive exploration of large datasets. Recently proposed Machine-Learning based AQP techniques can provide very low latency as query execution only involves model inference as compared to traditional query processing on database clusters. However, with increase in the number of filtering predicates(WHERE clauses), the approximation error significantly increases for these methods. Analysts often use queries with a large number of predicates for insights discovery. Thus, maintaining low approximation error is important to prevent analysts from drawing misleading conclusions. In this paper, we propose ELECTRA, a predicate-aware AQP system that can answer analytics-style queries with a large number of predicates with much smaller approximation errors. ELECTRA uses a conditional generative model that learns the conditional distribution of the data and at runtime generates a small (~1000 rows) but representative sample, on which the query is executed to compute the approximate result. Our evaluations with four different baselines on three real-world datasets show that ELECTRA provides lower AQP error for large number of predicates compared to baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题