论文标题
部分可观测时空混沌系统的无模型预测
Confidence May Cheat: Self-Training on Graph Neural Networks under Distribution Shift
论文作者
论文摘要
Graph卷积网络(GCN)最近引起了巨大的兴趣并在图形上实现了最先进的性能,但是它的成功通常可以在仔细培训中铰接,并使用数量昂贵且耗时的标签数据。为了减轻标记的数据稀缺性,通过标记高信心不标记的节点,然后将它们添加到训练步骤中,从而在图表上广泛采用了自我训练方法。在这一行中,我们从经验上进行了一项彻底研究,以研究图形上的当前自我训练方法。令人惊讶的是,我们发现高信心不标记的节点并不总是有用的,甚至通过自我训练,引入了原始标记的数据集和增强数据集之间的分配转移问题,从而严重阻碍了图表上自我训练的能力。为此,在本文中,我们提出了一个新的分布回收的图形自我训练框架(DR-GST),该框架可以恢复原始标记的数据集的分布。具体而言,如果每个伪标记的节点都由适当的系数加权,则首先证明在分布偏移案例下的自我训练框架和种群分布中自我训练框架的损失函数的平等性。考虑到系数的棘手性,我们建议在观察到它们之间的相同变化趋势之后,将信息增益替换为信息增益,在这种情况下,信息增益是通过辍学变异推断和DRDEDGE变异推理分别估算的,其中信息增益是在DR-GST中估算的。但是,这种加权损失函数将扩大不正确的伪标签的影响。结果,我们应用损失校正方法来提高伪标签的质量。我们在五个基准数据集上进行的理论分析和广泛的实验都证明了拟议的DR-GST的有效性,以及DR-GST中每个精心设计的组件。
Graph Convolutional Networks (GCNs) have recently attracted vast interest and achieved state-of-the-art performance on graphs, but its success could typically hinge on careful training with amounts of expensive and time-consuming labeled data. To alleviate labeled data scarcity, self-training methods have been widely adopted on graphs by labeling high-confidence unlabeled nodes and then adding them to the training step. In this line, we empirically make a thorough study for current self-training methods on graphs. Surprisingly, we find that high-confidence unlabeled nodes are not always useful, and even introduce the distribution shift issue between the original labeled dataset and the augmented dataset by self-training, severely hindering the capability of self-training on graphs. To this end, in this paper, we propose a novel Distribution Recovered Graph Self-Training framework (DR-GST), which could recover the distribution of the original labeled dataset. Specifically, we first prove the equality of loss function in self-training framework under the distribution shift case and the population distribution if each pseudo-labeled node is weighted by a proper coefficient. Considering the intractability of the coefficient, we then propose to replace the coefficient with the information gain after observing the same changing trend between them, where information gain is respectively estimated via both dropout variational inference and dropedge variational inference in DR-GST. However, such a weighted loss function will enlarge the impact of incorrect pseudo labels. As a result, we apply the loss correction method to improve the quality of pseudo labels. Both our theoretical analysis and extensive experiments on five benchmark datasets demonstrate the effectiveness of the proposed DR-GST, as well as each well-designed component in DR-GST.