论文标题
基于Jensen Shannon Divergence的转移概括差距的信息理论界限
Information-Theoretic Bounds on Transfer Generalization Gap Based on Jensen-Shannon Divergence
论文作者
论文摘要
在转移学习中,培训和测试数据集来自不同的数据分布。转移概括差距是目标数据分布的人口损失与培训损失之间的差异。培训数据集通常包括来自源和目标分布的数据。这项工作在平均转移概括差距上介绍了新的信息理论上限,该差距捕获了$(i)$的域名数据分配$ p'_z $和源分布之间的域移动$ p'_z $和p_z $ $ p_z $,通过两参数$(α_1,α_1,α_2)$ -Jensen-Shannon(JS JS)Divergences;和$(ii)$通过共同信息$ i(w; z_i)$对数据集$ z_i $的每个单独样本的转移学习者输出$ W $的灵敏度。对于$α_1\ in(0,1)$,即使在$ p'_z $的支持中不包含$ p_z $的支持,$(α_1,α_2)$ -JS Divergence也可以受到界限。这与Wu等人的基于Wu等人的基于kullback-Leibler(kl)Divergence $ d_ {kl} $ d_ {kl}(p_z || p'_z)$。 [1],在此假设下是空置的。此外,与Wu等人的$ ϕ $ -Divergence的结合不同,无界损耗函数获得的无界损耗函数的界限保持。 [1]。我们还以$(α_1,α_2)$ -JS差异的经验加权风险最小化(EWRM)的$(α_1,α_2)方面获得了新的上限,以最大程度地减少源和目标数据集的加权平均训练损失。最后,我们提供了一个数字示例来说明引入边界的优点。
In transfer learning, training and testing data sets are drawn from different data distributions. The transfer generalization gap is the difference between the population loss on the target data distribution and the training loss. The training data set generally includes data drawn from both source and target distributions. This work presents novel information-theoretic upper bounds on the average transfer generalization gap that capture $(i)$ the domain shift between the target data distribution $P'_Z$ and the source distribution $P_Z$ through a two-parameter family of generalized $(α_1,α_2)$-Jensen-Shannon (JS) divergences; and $(ii)$ the sensitivity of the transfer learner output $W$ to each individual sample of the data set $Z_i$ via the mutual information $I(W;Z_i)$. For $α_1 \in (0,1)$, the $(α_1,α_2)$-JS divergence can be bounded even when the support of $P_Z$ is not included in that of $P'_Z$. This contrasts the Kullback-Leibler (KL) divergence $D_{KL}(P_Z||P'_Z)$-based bounds of Wu et al. [1], which are vacuous under this assumption. Moreover, the obtained bounds hold for unbounded loss functions with bounded cumulant generating functions, unlike the $ϕ$-divergence based bound of Wu et al. [1]. We also obtain new upper bounds on the average transfer excess risk in terms of the $(α_1,α_2)$-JS divergence for empirical weighted risk minimization (EWRM), which minimizes the weighted average training losses over source and target data sets. Finally, we provide a numerical example to illustrate the merits of the introduced bounds.