论文标题

后验漂移模型中的计算有效分类算法:相变和最小值适应性

A Computationally Efficient Classification Algorithm in Posterior Drift Model: Phase Transition and Minimax Adaptivity

论文作者

Liu, Ruiqi, Li, Kexuan, Shang, Zuofeng

论文摘要

在大量数据分析中,培训和测试数据通常来自非常不同的来源,其概率分布不一定是相同的。一个特征示例是后漂移模型中的非参数分类,其中标签的条件分布可能不同。在本文中,我们在训练和测试数据都具有平稳分布的情况下得出了在后漂移模型中非参数分类的多余风险的最小率,从而扩大了Cai和Wei(2019)的最新工作,他们仅对测试数据的分布施加平滑度。 Minimax速率证明了一个相变,其特征是训练和测试数据分布之间的平滑度顺序之间的相互关系。我们还提出了一个计算高效且数据驱动的最近的邻居分类器,该分类器可实现最小值多余的风险(最多可对数因子)。进行了模拟研究和现实世界的应用以证明我们的方法。

In massive data analysis, training and testing data often come from very different sources, and their probability distributions are not necessarily identical. A feature example is nonparametric classification in posterior drift model where the conditional distributions of the label given the covariates are possibly different. In this paper, we derive minimax rate of the excess risk for nonparametric classification in posterior drift model in the setting that both training and testing data have smooth distributions, extending a recent work by Cai and Wei (2019) who only impose smoothness condition on the distribution of testing data. The minimax rate demonstrates a phase transition characterized by the mutual relationship between the smoothness orders of the training and testing data distributions. We also propose a computationally efficient and data-driven nearest neighbor classifier which achieves the minimax excess risk (up to a logarithm factor). Simulation studies and a real-world application are conducted to demonstrate our approach.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源