论文标题
图数据库中的多维事件数据
Multi-Dimensional Event Data in Graph Databases
论文作者
论文摘要
过程事件数据通常存储在顺序过程事件日志中或关系数据库中。事件日志的顺序,单维的性质有助于基于时间关系(例如“直接/最终遵循”)的事件的(子)序列查询,但它不支持查询多个相关实体的多维事件数据。关系数据库允许存储多维事件数据,但现有的查询语言不支持有关时间关系的序列或事件路径的查询。在本文中,我们根据标记的属性图为多维事件数据提出了一个通用数据模型,该模型允许以系统的方式将结构和时间关系存储在单个,基于图的数据结构中。我们为我们的数据模型的所有概念提供语义,以及在同步和异步相互作用的多个实体上建模事件数据的通用查询。这些查询允许将大型现实事件数据集有效地转换为我们的数据模型,并提供5个转换的数据集以进行进一步研究。我们表明,可以在现有的查询语言Cypher中有效地制定和执行此类多维事件数据的典型和高级查询,从而引发了一些新的研究问题。特别是我们的数据模型的聚合查询,可以使用现成的技术对多个相互关联的实体进行过程开采。
Process event data is usually stored either in a sequential process event log or in a relational database. While the sequential, single-dimensional nature of event logs aids querying for (sub)sequences of events based on temporal relations such as "directly/eventually-follows", it does not support querying multi-dimensional event data of multiple related entities. Relational databases allow storing multi-dimensional event data but existing query languages do not support querying for sequences or paths of events in terms of temporal relations. In this paper, we propose a general data model for multi-dimensional event data based on labeled property graphs that allows storing structural and temporal relations in a single, integrated graph-based data structure in a systematic way. We provide semantics for all concepts of our data model, and generic queries for modeling event data over multiple entities that interact synchronously and asynchronously . The queries allow for efficiently converting large real-life event data sets into our data model and we provide 5 converted data sets for further research. We show that typical and advanced queries for retrieving and aggregating such multidimensional event data can be formulated and executed efficiently in the existing query language Cypher, giving rise to several new research questions. Specifically aggregation queries on our data model enable process mining over multiple interrelated entities using off-the-shelf technology.