论文标题
概率数据库与独立细胞的一致性
The Consistency of Probabilistic Databases with Independent Cells
论文作者
论文摘要
具有属性级不确定性的概率数据库由某些属性的单元格可能具有概率分布而不是确定性内容的关系组成。在嘈杂的操作(例如丢失的数据推出)的背景下,这些数据库出现,我们会自动填充缺失的值,列预测,我们预测未知属性以及数据库清洁(和修复),在此我们由于检测到的原始值而替换了原始值,因为检测到的综合性或综合约束的违规错误。我们研究了在存在完整性约束下选择细胞值的问题的计算复杂性。更确切地说,我们专注于功能依赖性并研究三个问题:(1)决定是否可以通过任何选择值来满足约束,(2)找到最可能的选择,以及(3)计算满足约束的可能性。这些问题的数据复杂性是由功能依赖性集合和不确定属性的集合确定的。我们将完整的分类分类为可进行的多个约束,包括单个依赖性,匹配约束和一单位功能依赖性的复杂性。
A probabilistic database with attribute-level uncertainty consists of relations where cells of some attributes may hold probability distributions rather than deterministic content. Such databases arise, implicitly or explicitly, in the context of noisy operations such as missing data imputation, where we automatically fill in missing values, column prediction, where we predict unknown attributes, and database cleaning (and repairing), where we replace the original values due to detected errors or violation of integrity constraints. We study the computational complexity of problems that regard the selection of cell values in the presence of integrity constraints. More precisely, we focus on functional dependencies and study three problems: (1) deciding whether the constraints can be satisfied by any choice of values, (2) finding a most probable such choice, and (3) calculating the probability of satisfying the constraints. The data complexity of these problems is determined by the combination of the set of functional dependencies and the collection of uncertain attributes. We give full classifications into tractable and intractable complexities for several classes of constraints, including a single dependency, matching constraints, and unary functional dependencies.