论文标题

探索具有列表稳定性的AUPRC优化的算法依赖性概括

Exploring the Algorithm-Dependent Generalization of AUPRC Optimization with List Stability

论文作者

Wen, Peisong, Xu, Qianqian, Yang, Zhiyong, He, Yuan, Huang, Qingming

论文摘要

Precision-Recall曲线(AUPRC)下区域的随机优化是机器学习的关键问题。尽管已经对各种算法进行了广泛的AUPRC优化研究,但仅在多Query情况下保证了概括。在这项工作中,我们介绍了随机AUPRC优化的一次性概括中的第一个试验。对于更加概括的范围,我们关注算法依赖性概括。我们目的地都有算法和理论障碍。从算法的角度来看,我们注意到大多数现有随机估计器仅在偏见时偏向偏置,并且由于不可兼容性而导致不稳定的不稳定。为了解决这些问题,我们提出了具有较高稳定性的采样率 - 不变的无偏随机估计器。最重要的是,AUPRC优化是作为组成优化问题配制的,并提出了随机算法来解决此问题。从理论的角度来看,算法依赖性概括分析的标准技术不能直接应用于这种列表的组成优化问题。为了填补这一空白,我们将模型稳定性从实例损失扩展到列表损失,并弥合相应的概括和稳定性。此外,我们构建状态过渡矩阵以描述稳定性的复发,并通过矩阵频谱简化计算。实际上,关于三个图像检索数据集的实验结果,讲述了我们框架的有效性和健全性。

Stochastic optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning. Although various algorithms have been extensively studied for AUPRC optimization, the generalization is only guaranteed in the multi-query case. In this work, we present the first trial in the single-query generalization of stochastic AUPRC optimization. For sharper generalization bounds, we focus on algorithm-dependent generalization. There are both algorithmic and theoretical obstacles to our destination. From an algorithmic perspective, we notice that the majority of existing stochastic estimators are biased only when the sampling strategy is biased, and is leave-one-out unstable due to the non-decomposability. To address these issues, we propose a sampling-rate-invariant unbiased stochastic estimator with superior stability. On top of this, the AUPRC optimization is formulated as a composition optimization problem, and a stochastic algorithm is proposed to solve this problem. From a theoretical perspective, standard techniques of the algorithm-dependent generalization analysis cannot be directly applied to such a listwise compositional optimization problem. To fill this gap, we extend the model stability from instancewise losses to listwise losses and bridge the corresponding generalization and stability. Additionally, we construct state transition matrices to describe the recurrence of the stability, and simplify calculations by matrix spectrum. Practically, experimental results on three image retrieval datasets on speak to the effectiveness and soundness of our framework.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源