在Top-K曲线下优化部分区域：理论和实践

论文标题

在Top-K曲线下优化部分区域：理论和实践

Optimizing Partial Area Under the Top-k Curve: Theory and Practice

论文作者

Wang, Zitai, Xu, Qianqian, Yang, Zhiyong, He, Yuan, Cao, Xiaochun, Huang, Qingming

论文摘要

由于类之间不可避免的语义歧义，TOP-K错误已成为大规模分类基准测试的流行指标。有关TOP-K优化的现有文献通常集中于TOP-K目标的优化方法，同时忽略了度量本身的局限性。在本文中，我们指出，顶级目标缺乏足够的歧视，因此诱导的预测可能会使完全无关的标签成为最高等级。为了解决此问题，我们开发了一个新颖的度量标准，称为Top-K曲线（AUTKC）下的部分区域。理论分析表明，AUTKC具有更好的歧视能力，其贝叶斯最佳分数函数可以在条件概率方面给出正确的顶级排名。这表明AUTKC不允许无关标签出现在顶部列表中。此外，我们提出了一个经验替代风险最小化框架，以优化拟议的度量标准。从理论上讲，我们提出（1）贝叶斯最佳分数功能的渔民一致性的足够条件；（2）在简单的超参数设置下对类数量不敏感的上限。最后，四个基准数据集的实验结果验证了我们提出的框架的有效性。

Top-k error has become a popular metric for large-scale classification benchmarks due to the inevitable semantic ambiguity among classes. Existing literature on top-k optimization generally focuses on the optimization method of the top-k objective, while ignoring the limitations of the metric itself. In this paper, we point out that the top-k objective lacks enough discrimination such that the induced predictions may give a totally irrelevant label a top rank. To fix this issue, we develop a novel metric named partial Area Under the top-k Curve (AUTKC). Theoretical analysis shows that AUTKC has a better discrimination ability, and its Bayes optimal score function could give a correct top-K ranking with respect to the conditional probability. This shows that AUTKC does not allow irrelevant labels to appear in the top list. Furthermore, we present an empirical surrogate risk minimization framework to optimize the proposed metric. Theoretically, we present (1) a sufficient condition for Fisher consistency of the Bayes optimal score function; (2) a generalization upper bound which is insensitive to the number of classes under a simple hyperparameter setting. Finally, the experimental results on four benchmark datasets validate the effectiveness of our proposed framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题