论文标题
在最坏情况下恒定时间中的秩上滑动窗口聚合
In-Order Sliding-Window Aggregation in Worst-Case Constant Time
论文作者
论文摘要
滑动窗口聚合是一种广泛使用的方法,用于从数据流的最新部分中提取见解。感兴趣的汇总通常可以表示为二进制运算符,这些操作员是关联但不一定是交换性或可逆的。但是,不可换行的操作员很难有效地支持。在2017年的会议论文中,我们介绍了DABA,这是第一个以最差的持续时间进行滑动窗口聚合的算法。在DABA之前,如果窗口的尺寸为$ n $,则最佳发布的算法将需要$ o(\ log n)$每个窗口操作的聚集步骤---对于严格的订购流,可以将其提高到$ o(1)$聚集步骤平均而言,不知道该如何实现$ O(1)$限制的适用于最差的case-case-case-case-sentency,这是一个至关重要的,这是一个重要的cases,这是一个很重要的。 本文是我们2017年论文的扩展版本。除了更详细地描述DABA外,本文还介绍了一种新的变体Daba Lite,它在更少的内存中实现了相同时间的界限。 DABA需要存储$ 2N $的部分聚合物的空间,而Daba Lite仅需要$ n+2 $部分骨料的空间。我们关于合成和实际数据的实验支持了理论发现。
Sliding-window aggregation is a widely-used approach for extracting insights from the most recent portion of a data stream. The aggregations of interest can usually be expressed as binary operators that are associative but not necessarily commutative nor invertible. Non-invertible operators, however, are difficult to support efficiently. In a 2017 conference paper, we introduced DABA, the first algorithm for sliding-window aggregation with worst-case constant time. Before DABA, if a window had size $n$, the best published algorithms would require $O(\log n)$ aggregation steps per window operation---and while for strictly in-order streams, this bound could be improved to $O(1)$ aggregation steps on average, it was not known how to achieve an $O(1)$ bound for the worst-case, which is critical for latency-sensitive applications. This article is an extended version of our 2017 paper. Besides describing DABA in more detail, this article introduces a new variant, DABA Lite, which achieves the same time bounds in less memory. Whereas DABA requires space for storing $2n$ partial aggregates, DABA Lite only requires space for $n+2$ partial aggregates. Our experiments on synthetic and real data support the theoretical findings.