对随机投影线性判别因子的合奏的渐近分析

论文标题

对随机投影线性判别因子的合奏的渐近分析

Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants

论文作者

Niyazi, Lama B., Kammoun, Abla, Dahrouj, Hayssam, Alouini, Mohamed-Slim, Al-Naffouri, Tareq Y.

论文摘要

来自生物信息学，化学计量学和面部识别领域的数据集通常以高维数据的小样本为特征。为了纠正这种环境中与分类相关的问题，已提出的线性判别分析的众多变体中，[1]中的分类器是由随机投影的线性判别物组成的集合，似乎尤其有希望。它在计算上是有效的，并且在最佳投影维度参数设置中，它与最新的ART竞争。在这项工作中，我们试图通过渐近分析进一步了解该分类器的行为。在假设数据集和投影维度以恒定速率增长的生长状态下，我们使用随机矩阵理论来得出渐近错误分类的概率，显示了整体作为数据样本示例协方差矩阵的正规化的影响。渐近错误进一步有助于确定合奏提供性能优势的情况。我们还开发了错误分类概率的一致估计量，作为计算成本交叉验证估计器的替代方案，该估计量通常用于参数调整。最后，我们证明了估算器在真实和合成数据上都使用估算器来调整投影维度。

Datasets from the fields of bioinformatics, chemometrics, and face recognition are typically characterized by small samples of high-dimensional data. Among the many variants of linear discriminant analysis that have been proposed in order to rectify the issues associated with classification in such a setting, the classifier in [1], composed of an ensemble of randomly projected linear discriminants, seems especially promising; it is computationally efficient and, with the optimal projection dimension parameter setting, is competitive with the state-of-the-art. In this work, we seek to further understand the behavior of this classifier through asymptotic analysis. Under the assumption of a growth regime in which the dataset and projection dimensions grow at constant rates to each other, we use random matrix theory to derive asymptotic misclassification probabilities showing the effect of the ensemble as a regularization of the data sample covariance matrix. The asymptotic errors further help to identify situations in which the ensemble offers a performance advantage. We also develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator, which is conventionally used for parameter tuning. Finally, we demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题