论文标题

了解渐进域的适应:改进的分析,最佳路径及以后

Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond

论文作者

Wang, Haoxiang, Li, Bo, Zhao, Han

论文摘要

无监督域适应性(UDA)的绝大多数现有算法都集中在以一次性的方式直接从标记的源域调整到未标记的目标域。另一方面,渐进域适应(GDA)假设桥接源和目标的$(t-1)$未标记的中间域,旨在通过利用中间的路径在目标域中提供更好的概括。在某些假设下,Kumar等人。 (2020)提出了一种简单的算法,逐渐自我训练,并以$ e^{o(t)} \ left(\ varepsilon_0+o \ left(\ sqrt {log(log(log(log(log(t)/n}/n} \ right)),$ is $ is $ is $ \ vareps uniron unono,每个域。由于指数因素,当$ t $中等大型时,该上限变得空虚。在这项工作中,我们在更一般和更轻松的假设下分析了逐渐自我训练,并证明概括为$ \ varepsilon_0+ o \ left(tδ+ t/\ sqrt {n} \ right)+ \ wideTilde+ wideTilde {o} {o} \ left(1/\ sqrt)连续域。与对$ t $作为乘法因素的指数依赖性的现有界限相比,我们的界限仅取决于$ t $线性和附加性。也许更有趣的是,我们的结果暗示了$ t $的最佳选择,从而最大程度地减少了概括误差,并且自然也提出了一种构造中间域路径的最佳方法,以最大程度地减少源源和目标之间的累积路径长度$tδ$。为了证实我们理论的含义,我们检查了对多个半合成和真实数据集的逐步自我训练,这证实了我们的发现。我们相信我们的见解为未来GDA算法设计的途径提供了前进的途径。

The vast majority of existing algorithms for unsupervised domain adaptation (UDA) focus on adapting from a labeled source domain to an unlabeled target domain directly in a one-off way. Gradual domain adaptation (GDA), on the other hand, assumes a path of $(T-1)$ unlabeled intermediate domains bridging the source and target, and aims to provide better generalization in the target domain by leveraging the intermediate ones. Under certain assumptions, Kumar et al. (2020) proposed a simple algorithm, Gradual Self-Training, along with a generalization bound in the order of $e^{O(T)} \left(\varepsilon_0+O\left(\sqrt{log(T)/n}\right)\right)$ for the target domain error, where $\varepsilon_0$ is the source domain error and $n$ is the data size of each domain. Due to the exponential factor, this upper bound becomes vacuous when $T$ is only moderately large. In this work, we analyze gradual self-training under more general and relaxed assumptions, and prove a significantly improved generalization bound as $\varepsilon_0+ O \left(TΔ+ T/\sqrt{n}\right) + \widetilde{O}\left(1/\sqrt{nT}\right)$, where $Δ$ is the average distributional distance between consecutive domains. Compared with the existing bound with an exponential dependency on $T$ as a multiplicative factor, our bound only depends on $T$ linearly and additively. Perhaps more interestingly, our result implies the existence of an optimal choice of $T$ that minimizes the generalization error, and it also naturally suggests an optimal way to construct the path of intermediate domains so as to minimize the accumulative path length $TΔ$ between the source and target. To corroborate the implications of our theory, we examine gradual self-training on multiple semi-synthetic and real datasets, which confirms our findings. We believe our insights provide a path forward toward the design of future GDA algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源