论文标题
贝叶斯距离加权歧视
Bayesian Distance Weighted Discrimination
论文作者
论文摘要
距离加权歧视(DWD)是一种线性歧视方法,特别适合具有高维数据的分类任务。 DWD系数最小化直观的目标函数,该目标函数可以使用最先进的优化技术非常有效地解决。但是,DWD尚未被归入基于模型的统计推断框架。在本文中,我们表明DWD识别适当的贝叶斯后验分布的模式,这是由特定的链接函数的类别概率和系数上的收缩诱导的适当的先验分布产生的。我们描述了一种相对有效的马尔可夫链蒙特卡洛(MCMC)算法,以模拟该贝叶斯框架下的真实后部。我们表明,后验是渐近正常的,并得出了其限制分布的平均值和协方差矩阵。 Through several simulation studies and an application to breast cancer genomics we demonstrate how the Bayesian approach to DWD can be used to (1) compute well-calibrated posterior class probabilities, (2) assess uncertainty in the DWD coefficients and resulting sample scores, (3) improve power via semi-supervised analysis when not all class labels are available, and (4) automatically determine a penalty tuning parameter within the model-based framework.可执行贝叶斯DWD的R代码可在https://github.com/lockef/bayesiandwd上找到。
Distance weighted discrimination (DWD) is a linear discrimination method that is particularly well-suited for classification tasks with high-dimensional data. The DWD coefficients minimize an intuitive objective function, which can solved very efficiently using state-of-the-art optimization techniques. However, DWD has not yet been cast into a model-based framework for statistical inference. In this article we show that DWD identifies the mode of a proper Bayesian posterior distribution, that results from a particular link function for the class probabilities and a shrinkage-inducing proper prior distribution on the coefficients. We describe a relatively efficient Markov chain Monte Carlo (MCMC) algorithm to simulate from the true posterior under this Bayesian framework. We show that the posterior is asymptotically normal and derive the mean and covariance matrix of its limiting distribution. Through several simulation studies and an application to breast cancer genomics we demonstrate how the Bayesian approach to DWD can be used to (1) compute well-calibrated posterior class probabilities, (2) assess uncertainty in the DWD coefficients and resulting sample scores, (3) improve power via semi-supervised analysis when not all class labels are available, and (4) automatically determine a penalty tuning parameter within the model-based framework. R code to perform Bayesian DWD is available at https://github.com/lockEF/BayesianDWD .