论文标题
具有自动参数选择的双重正规化线性判别分析分析符
A Doubly Regularized Linear Discriminant Analysis Classifier with Automatic Parameter Selection
论文作者
论文摘要
基于线性的判别分析(LDA)分类器在许多实践环境中倾向于步履蹒跚,因为训练数据大小小于特征的数量或可比性。作为一种补救措施,已经提出了不同的正则LDA(RLDA)方法。根据可用培训数据的大小和质量,这些方法的性能仍然很差。特别地,例如,由于噪声污染而与训练数据模型的测试数据偏差可能导致严重的性能降解。此外,这些方法进一步提高了高斯假设(在其上建立LDA)来调整其正则化参数,这在处理真实数据时可能会损害准确性。为了解决这些问题,我们提出了一个双重正规化的LDA分类器,我们将其表示为R2LDA。在提出的R2LDA方法中,RLDA得分函数转换为两个向量的内部产物。通过替换这些向量的正则估计器的表达式,我们获得了涉及两个正则化参数的R2LDA得分函数。为了设置这些参数的值,我们采用了三种现有的正则化技术。受约束的扰动正则化方法(COUDRA),有限的扰动正则化(BPR)算法和广义交叉验证(GCV)方法。这些方法用于基于线性估计模型来调整正则化参数,样品协方差矩阵的平方根是线性操作员。从合成数据和实际数据获得的结果证明了拟议的R2LDA方法的一致性和有效性,尤其是在涉及在训练阶段未观察到的噪声污染的测试数据的情况下。
Linear discriminant analysis (LDA) based classifiers tend to falter in many practical settings where the training data size is smaller than, or comparable to, the number of features. As a remedy, different regularized LDA (RLDA) methods have been proposed. These methods may still perform poorly depending on the size and quality of the available training data. In particular, the test data deviation from the training data model, for example, due to noise contamination, can cause severe performance degradation. Moreover, these methods commit further to the Gaussian assumption (upon which LDA is established) to tune their regularization parameters, which may compromise accuracy when dealing with real data. To address these issues, we propose a doubly regularized LDA classifier that we denote as R2LDA. In the proposed R2LDA approach, the RLDA score function is converted into an inner product of two vectors. By substituting the expressions of the regularized estimators of these vectors, we obtain the R2LDA score function that involves two regularization parameters. To set the values of these parameters, we adopt three existing regularization techniques; the constrained perturbation regularization approach (COPRA), the bounded perturbation regularization (BPR) algorithm, and the generalized cross-validation (GCV) method. These methods are used to tune the regularization parameters based on linear estimation models, with the sample covariance matrix's square root being the linear operator. Results obtained from both synthetic and real data demonstrate the consistency and effectiveness of the proposed R2LDA approach, especially in scenarios involving test data contaminated with noise that is not observed during the training phase.