论文标题
通过线性编程的协方差参与集合的分类
Covariance-engaged Classification of Sets via Linear Programming
论文作者
论文摘要
SET分类旨在将一组观测值分类为整体,而不是分别对个体观察进行分类。为了正式理解二进制组合分类的陌生概念,我们首先研究了正态分布下的最佳决策规则,该规则利用集合的经验协方差进行分类。我们表明,该集合中的观察次数在界限贝叶斯风险方面起着至关重要的作用。在此框架下,我们进一步提出了集合分类的新方法。对于仅几个模型参数驱动两个类之间的差异的情况,我们提出了一种使用线性编程的计算效率方法来进行参数估计,从而导致协方差启动的线性编程集(剪辑)分类器。在独立情况和各种观测值之间的各种情况下,研究了其理论属性。建立了估计错误的收敛速率和剪辑分类器的风险,以表明,与标准分类情况相比,集合中有多个观察结果会导致更快的收敛速率,在该标准分类情况下,该集合中只有一个观察结果。在一项综合模拟研究中,强调了剪辑比竞争对手更好的适用域。最后,我们说明了所提出的方法在组织病理学中真实图像数据分类中的有用性。
Set classification aims to classify a set of observations as a whole, as opposed to classifying individual observations separately. To formally understand the unfamiliar concept of binary set classification, we first investigate the optimal decision rule under the normal distribution, which utilizes the empirical covariance of the set to be classified. We show that the number of observations in the set plays a critical role in bounding the Bayes risk. Under this framework, we further propose new methods of set classification. For the case where only a few parameters of the model drive the difference between two classes, we propose a computationally-efficient approach to parameter estimation using linear programming, leading to the Covariance-engaged LInear Programming Set (CLIPS) classifier. Its theoretical properties are investigated for both independent case and various (short-range and long-range dependent) time series structures among observations within each set. The convergence rates of estimation errors and risk of the CLIPS classifier are established to show that having multiple observations in a set leads to faster convergence rates, compared to the standard classification situation in which there is only one observation in the set. The applicable domains in which the CLIPS performs better than competitors are highlighted in a comprehensive simulation study. Finally, we illustrate the usefulness of the proposed methods in classification of real image data in histopathology.