论文标题
与数据不确定性嵌入图
Graph Embedding with Data Uncertainty
论文作者
论文摘要
基于光谱的子空间学习是许多机器学习管道中常见的数据预处理步骤。主要目的是学习数据的有意义的低维嵌入。但是,大多数子空间学习方法没有考虑可能导致具有高不确定性数据的数据可能的测量错误或伪像。因此,直接从原始数据中学习可能会产生误导,并可能对准确性产生负面影响。在本文中,我们建议使用概率分布来对训练数据中的工件进行建模;每个数据点由以原始数据点为中心的高斯分布表示,并具有对其不确定性建模的差异。我们重新制定了图形嵌入框架,以使其适合从分布中学习,并作为特殊情况进行了线性判别分析和边缘费舍尔分析技术。此外,我们提出了两种方案,以基于对无监督和监督环境中的成对距离进行建模数据不确定性。
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines. The main aim is to learn a meaningful low dimensional embedding of the data. However, most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty. Thus, learning directly from raw data can be misleading and can negatively impact the accuracy. In this paper, we propose to model artifacts in training data using probability distributions; each data point is represented by a Gaussian distribution centered at the original data point and having a variance modeling its uncertainty. We reformulate the Graph Embedding framework to make it suitable for learning from distributions and we study as special cases the Linear Discriminant Analysis and the Marginal Fisher Analysis techniques. Furthermore, we propose two schemes for modeling data uncertainty based on pair-wise distances in an unsupervised and a supervised contexts.