稀疏标记的源辅助域适应

论文标题

稀疏标记的源辅助域适应

Sparsely-Labeled Source Assisted Domain Adaptation

论文作者

Wang, Wei, Wang, Zhihui, Xiang, Yuankai, Sun, Jing, Li, Haojie, Sun, Fuming, Ding, Zhengming

论文摘要

域适应性（DA）旨在概括从源域中学到的分类器到目标域。现有的DA方法通常假定富列标签可以在源域中可用。但是，通常有大量未标记的数据，但在源域中只有少数标记的数据，以及如何将知识从该稀疏标记的源域转移到目标域仍然是一个挑战，这极大地限制了他们在野外的应用。本文提出了一种新颖的标记源辅助域适应（SLSA-DA）算法，以通过有限的标记源域样品来应对挑战。具体而言，由于标签稀缺问题，投影的聚类均在源和目标域进行，因此可以优雅地利用数据的判别结构。然后采用标签传播来逐步将标签从那些有限的标记源样本传播到整个未标记的数据，以便正确揭示群集标签。最后，我们将边缘和条件分布结合起来，以减轻跨域不匹配问题，并迭代地优化这三个程序。但是，将这三个程序纳入一个统一的优化框架是无缝的，因为某些要优化的变量隐含地参与了其公式，因此它们无法彼此促进。值得注意的是，我们证明可以将预计的聚类和条件分布对齐方式重新归类为不同的表达式，因此隐式变量以不同的优化步骤揭示。因此，与这三个数量相关的变量可以在统一的优化框架中进行优化，并相互促进，以改善识别性能。

Domain Adaptation (DA) aims to generalize the classifier learned from the source domain to the target domain. Existing DA methods usually assume that rich labels could be available in the source domain. However, there are usually a large number of unlabeled data but only a few labeled data in the source domain, and how to transfer knowledge from this sparsely-labeled source domain to the target domain is still a challenge, which greatly limits their application in the wild. This paper proposes a novel Sparsely-Labeled Source Assisted Domain Adaptation (SLSA-DA) algorithm to address the challenge with limited labeled source domain samples. Specifically, due to the label scarcity problem, the projected clustering is conducted on both the source and target domains, so that the discriminative structures of data could be leveraged elegantly. Then the label propagation is adopted to propagate the labels from those limited labeled source samples to the whole unlabeled data progressively, so that the cluster labels are revealed correctly. Finally, we jointly align the marginal and conditional distributions to mitigate the cross-domain mismatch problem, and optimize those three procedures iteratively. However, it is nontrivial to incorporate those three procedures into a unified optimization framework seamlessly since some variables to be optimized are implicitly involved in their formulas, thus they could not promote to each other. Remarkably, we prove that the projected clustering and conditional distribution alignment could be reformulated as different expressions, thus the implicit variables are revealed in different optimization steps. As such, the variables related to those three quantities could be optimized in a unified optimization framework and facilitate to each other, to improve the recognition performance obviously.

下载PDF全文

下载文献需遵守相关版权规定

论文标题