论文标题
AUC最大化的随机硬阈值算法
Stochastic Hard Thresholding Algorithms for AUC Maximization
论文作者
论文摘要
在本文中,我们旨在开发随机硬阈值算法,以解决不平衡分类中AUC最大化的重要问题。主要挑战是AUC最大化涉及的成对损失。我们通过将U统计目的函数重新定义为经验风险最小化(ERM)来克服这一障碍,从中开发了随机硬阈值算法(\ texttttt {sht-auc})。据我们所知,这是为AUC最大化提供随机硬阈值算法的首次尝试,而每卷成本$Ø(b d)$($ d $和$ b $)分别是数据的尺寸和Minibatch大小。我们表明,所提出的算法符合线性收敛速率,直至公差误差。特别是,我们表明,如果数据是从高斯分布中生成的,那么随着数据的不平衡,其收敛性变得越来越慢。我们进行广泛的实验,以显示所提出算法的效率和有效性。
In this paper, we aim to develop stochastic hard thresholding algorithms for the important problem of AUC maximization in imbalanced classification. The main challenge is the pairwise loss involved in AUC maximization. We overcome this obstacle by reformulating the U-statistics objective function as an empirical risk minimization (ERM), from which a stochastic hard thresholding algorithm (\texttt{SHT-AUC}) is developed. To our best knowledge, this is the first attempt to provide stochastic hard thresholding algorithms for AUC maximization with a per-iteration cost $Ø(b d)$ where $d$ and $b$ are the dimension of the data and the minibatch size, respectively. We show that the proposed algorithm enjoys the linear convergence rate up to a tolerance error. In particular, we show, if the data is generated from the Gaussian distribution, then its convergence becomes slower as the data gets more imbalanced. We conduct extensive experiments to show the efficiency and effectiveness of the proposed algorithms.