GRMR：普遍的遗憾最少代表

论文标题

GRMR：普遍的遗憾最少代表

GRMR: Generalized Regret-Minimizing Representatives

论文作者

Wang, Yanhao, Mathioudakis, Michael, Li, Yuchen, Tan, Kian-Lee

论文摘要

从大数据库中提取一小部分代表性元组是多标准决策的重要任务。最近提出了从数据库中发现的代表性发现的遗憾最小化集（RMS）问题。具体而言，对于$ d $维度中的一组元组（点），RMS问题找到最小的子集，因此，对于任何可能的排名函数，子集中的顶部排名点之间的分数相对差异与整个数据库中的顶部排名点之间的分数差异在参数$ \ varepsilon \ in（0,0,1,1,1）$中。尽管在文献中已经对RMS及其变化进行了广泛的研究，但现有方法仅考虑用于排名的非负（单调）线性函数类别，这些函数在建模用户偏好和决策过程中具有局限性。为了解决这个问题，我们定义了通过考虑所有线性函数（包括具有负权重的非单调功能）来扩展RMS的广义遗憾最小化代表（GRMR）问题。对于二维数据库，我们通过转换为有向图的最短循环问题提出了GRMR的最佳算法。由于GRMR即使在三个维度上也被证明是NP-HARD，因此我们进一步开发了一种在任意维度的数据库中GRMR的多项式时间启发式算法。最后，我们对真实和合成数据集进行了广泛的实验，以确认我们提出的算法的效率，有效性和可伸缩性。

Extracting a small subset of representative tuples from a large database is an important task in multi-criteria decision making. The regret-minimizing set (RMS) problem is recently proposed for representative discovery from databases. Specifically, for a set of tuples (points) in $d$ dimensions, an RMS problem finds the smallest subset such that, for any possible ranking function, the relative difference in scores between the top-ranked point in the subset and the top-ranked point in the entire database is within a parameter $\varepsilon \in (0,1)$. Although RMS and its variations have been extensively investigated in the literature, existing approaches only consider the class of nonnegative (monotonic) linear functions for ranking, which have limitations in modeling user preferences and decision-making processes. To address this issue, we define the generalized regret-minimizing representative (GRMR) problem that extends RMS by taking into account all linear functions including non-monotonic ones with negative weights. For two-dimensional databases, we propose an optimal algorithm for GRMR via a transformation into the shortest cycle problem in a directed graph. Since GRMR is proven to be NP-hard even in three dimensions, we further develop a polynomial-time heuristic algorithm for GRMR on databases in arbitrary dimensions. Finally, we conduct extensive experiments on real and synthetic datasets to confirm the efficiency, effectiveness, and scalability of our proposed algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题