在最坏情况下恒定时间中的秩上滑动窗口聚合

论文标题

在最坏情况下恒定时间中的秩上滑动窗口聚合

In-Order Sliding-Window Aggregation in Worst-Case Constant Time

论文作者

Tangwongsan, Kanat, Hirzel, Martin, Schneider, Scott

论文摘要

滑动窗口聚合是一种广泛使用的方法，用于从数据流的最新部分中提取见解。感兴趣的汇总通常可以表示为二进制运算符，这些操作员是关联但不一定是交换性或可逆的。但是，不可换行的操作员很难有效地支持。在2017年的会议论文中，我们介绍了DABA，这是第一个以最差的持续时间进行滑动窗口聚合的算法。在DABA之前，如果窗口的尺寸为$ n $，则最佳发布的算法将需要$ o（\ log n）$每个窗口操作的聚集步骤---对于严格的订购流，可以将其提高到$ o（1）$聚集步骤平均而言，不知道该如何实现$ O（1）$限制的适用于最差的case-case-case-case-sentency，这是一个至关重要的，这是一个重要的cases，这是一个很重要的。本文是我们2017年论文的扩展版本。除了更详细地描述DABA外，本文还介绍了一种新的变体Daba Lite，它在更少的内存中实现了相同时间的界限。 DABA需要存储$ 2N $的部分聚合物的空间，而Daba Lite仅需要$ n+2 $部分骨料的空间。我们关于合成和实际数据的实验支持了理论发现。

Sliding-window aggregation is a widely-used approach for extracting insights from the most recent portion of a data stream. The aggregations of interest can usually be expressed as binary operators that are associative but not necessarily commutative nor invertible. Non-invertible operators, however, are difficult to support efficiently. In a 2017 conference paper, we introduced DABA, the first algorithm for sliding-window aggregation with worst-case constant time. Before DABA, if a window had size $n$, the best published algorithms would require $O(\log n)$ aggregation steps per window operation---and while for strictly in-order streams, this bound could be improved to $O(1)$ aggregation steps on average, it was not known how to achieve an $O(1)$ bound for the worst-case, which is critical for latency-sensitive applications. This article is an extended version of our 2017 paper. Besides describing DABA in more detail, this article introduces a new variant, DABA Lite, which achieves the same time bounds in less memory. Whereas DABA requires space for storing $2n$ partial aggregates, DABA Lite only requires space for $n+2$ partial aggregates. Our experiments on synthetic and real data support the theoretical findings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题