论文标题
无限概率数据库
Infinite Probabilistic Databases
论文作者
论文摘要
概率数据库(PDB)以定量方式模型数据中的不确定性。在既定的正式框架中,概率(关系)数据库是关系数据库实例的有限概率空间。这种有限性可以与直观的查询行为发生冲突(Ceylan等,KR,2016年),并且应用程序场景更好地通过连续概率分布来建模(Dalvi等,CACM,2009)。 我们在(Grohe and Lindner,Pods 2019)中正式引入了无限的PDB,主要关注无数的空间。但是,超出可数概率空间的扩展会引发与事件和查询的可测量性有关的非平凡基础问题,并最终与查询是否具有定义明确的语义有关。 我们认为,有限点过程是处理一般概率数据库的概率理论的适当模型。这使我们能够以系统的方式构建数据库实例的合适(无数)概率空间。我们的主要技术结果是有关关系代数查询以及汇总查询和数据编号查询的可测量性语句。
Probabilistic databases (PDBs) model uncertainty in data in a quantitative way. In the established formal framework, probabilistic (relational) databases are finite probability spaces over relational database instances. This finiteness can clash with intuitive query behavior (Ceylan et al., KR 2016), and with application scenarios that are better modeled by continuous probability distributions (Dalvi et al., CACM 2009). We formally introduced infinite PDBs in (Grohe and Lindner, PODS 2019) with a primary focus on countably infinite spaces. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics. We argue that finite point processes are an appropriate model from probability theory for dealing with general probabilistic databases. This allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and Datalog queries.