论文标题
使用软功能依赖性维修数据库
Database Repairing with Soft Functional Dependencies
论文作者
论文摘要
软性约束的共同解释会对每一个违反每个约束的违规行为进行惩罚,其中惩罚是约束的成本(重量)。计算挑战是找到一个最佳子集的挑战:数据库元组集合,当每个元组都有被排除的成本时,可以最大程度地减少总罚款。当约束严格(即,有无限的成本)时,此子集是不一致数据库的“基数修复”。在软解释中,该子集对应于概率数据库的“最可能的世界”,概率不洁数据库的“最有可能的意图”等等。在功能依赖性类别中,可以彻底理解寻找基数修复的复杂性。然而,在更通用的软义语义中,对这个问题的复杂性知之甚少。本文在这个方向上取得了重大进展。除了对问题的硬度和近似性的一般见解外,我们还提出了两种特殊情况的算法:单个功能依赖性和双分部分匹配。后者是找到两分图的最佳“几乎匹配”的问题,在该图中,每一个失落的边缘和每一次违反一夫一妻制的行为都会支付罚款。
A common interpretation of soft constraints penalizes the database for every violation of every constraint, where the penalty is the cost (weight) of the constraint. A computational challenge is that of finding an optimal subset: a collection of database tuples that minimizes the total penalty when each tuple has a cost of being excluded. When the constraints are strict (i.e., have an infinite cost), this subset is a "cardinality repair" of an inconsistent database; in soft interpretations, this subset corresponds to a "most probable world" of a probabilistic database, a "most likely intention" of a probabilistic unclean database, and so on. Within the class of functional dependencies, the complexity of finding a cardinality repair is thoroughly understood. Yet, very little is known about the complexity of this problem in the more general soft semantics. This paper makes a significant progress in this direction. In addition to general insights about the hardness and approximability of the problem, we present algorithms for two special cases: a single functional dependency, and a bipartite matching. The latter is the problem of finding an optimal "almost matching" of a bipartite graph where a penalty is paid for every lost edge and every violation of monogamy.