流数据处理的窗口表达式

论文标题

流数据处理的窗口表达式

Window Expressions for Stream Data Processing

论文作者

Praveen, M., Hitarth, S.

论文摘要

在不断生成数据且需要采取快速决策的情况下，存储和查询数据的传统方式无法很好地工作。例如，在医院重症监护病房中，需要监控多个设备的信号，并且任何异常的发生应立即引起警报。典型的设计将从10秒（基于时间）或连续10个基于计数的读数的窗口中平均读取并寻找突然的偏差。现有的流处理系统要么将窗口限制为时间或计数基于窗口，要么让用户在命令式编程语言中定义自定义的窗口。这些受到实施者对他人所需的内容的解释，对他人很难理解。我们介绍了一种形式主义，用于根据Monadic二阶逻辑指定窗口。它比用命令式语言编写的临时定义具有多个优点。我们证明了四个这样的优势。首先，我们说明如何使用精确的语义可以轻松编写实用的流数据查询。其次，我们可以得到不同但表现等效的形式主义来定义窗口。我们使用其中之一（正则表达式）设计用于定义窗口的最终用户友好语言。第三，我们使用另一种表达等效的形式主义（Automata）来设计一个根据规格自动生成窗口的处理器。我们证明的第四个优势更为复杂。某些窗口定义的问题是，彼此之间的窗口过多的重叠，这使处理引擎压倒了。不同的发动机以不同的方式处理这一点，但是所有选项都与在运行时发生的情况有关。我们将其研究为一个静态分析问题，并证明不可能检查给定窗口定义是否会出现这种情况。我们确定一个可决定的片段...

Traditional ways of storing and querying data do not work well in scenarios where data is being generated continuously and quick decisions need to be taken. For example, in hospital intensive care units, signals from multiple devices need to be monitored and the occurrence of any anomaly should raise alarms immediately. A typical design would take the average from a window of say 10 seconds (time-based) or 10 successive (count-based) readings and look for sudden deviations. Existing stream processing systems either restrict the windows to time or count-based windows or let users define customized windows in imperative programming languages. These are subject to the implementers' interpretation of what is desired and hard to understand for others. We introduce a formalism for specifying windows based on Monadic Second Order logic. It offers several advantages over ad-hoc definitions written in imperative languages. We demonstrate four such advantages. First, we illustrate how practical streaming data queries can be easily written with precise semantics. Second, we can get different but expressively equivalent formalisms for defining windows. We use one of them (regular expressions) to design an end-user-friendly language for defining windows. Third, we use another expressively equivalent formalism (automata) to design a processor that automatically generates windows according to specifications. The fourth advantage we demonstrate is more sophisticated. Some window definitions have the problem of too many windows overlapping with each other, overwhelming the processing engine. This is handled in different ways by different engines, but all the options are about what to do when this happens at runtime. We study this as a static analysis question and prove that it is undecidable to check whether such a scenario can ever arise for a given window definition. We identify a decidable fragment...

下载PDF全文

下载文献需遵守相关版权规定

论文标题