XPASC：通过解释性和关联来衡量弱监督的概括

论文标题

XPASC：通过解释性和关联来衡量弱监督的概括

XPASC: Measuring Generalization in Weak Supervision by Explainability and Association

论文作者

März, Luisa, Asgari, Ehsaneddin, Braune, Fabienne, Zimmermann, Franziska, Roth, Benjamin

论文摘要

由于其能够创建大量标记的数据，只需几乎需要手动努力，因此在各种域和任务中都利用了弱监督。标准方法使用标签功能来指定与标签相关的信号。有人认为，对这些信号过度汇总的模型弱，因此遭受了过度拟合的损失。为了验证这一假设，我们介绍了一种新颖的方法XPASC（可解释性 - 协调得分），用于测量使用弱监督数据集训练的模型的概括。考虑到数据集中的功能，类和标记功能的出现，XPASC考虑了每个功能对于模型预测的相关性，以及该功能与类和标签函数的关联。 XPASC中的关联可以在两个变体中进行测量：XPASC-CHI SQAURE相对于其统计意义的关联量，而XPASC-PPMI衡量关联强度的强度。我们使用XPASC分析Knowman，这是一种对抗性体系结构，旨在控制标签功能的概括程度，从而减轻过度拟合的问题。一方面，我们表明Knowman能够通过超参数控制概括程度。另一方面，结果和定性分析表明，概括和性能与一对一的概括不是一对一的，并且最高程度的概括并不一定意味着最佳性能。因此，允许控制概括量的方法可以实现正确的良性过度拟合度。我们在这项研究中的贡献是i）XPASC评分衡量弱监督模型中的概括，ii）跨数据集和模型对XPASC进行评估，以及iii）XPASC实施的释放。

Weak supervision is leveraged in a wide range of domains and tasks due to its ability to create massive amounts of labeled data, requiring only little manual effort. Standard approaches use labeling functions to specify signals that are relevant for the labeling. It has been conjectured that weakly supervised models over-rely on those signals and as a result suffer from overfitting. To verify this assumption, we introduce a novel method, XPASC (eXPlainability-Association SCore), for measuring the generalization of a model trained with a weakly supervised dataset. Considering the occurrences of features, classes and labeling functions in a dataset, XPASC takes into account the relevance of each feature for the predictions of the model as well as the associations of the feature with the class and the labeling function, respectively. The association in XPASC can be measured in two variants: XPASC-CHI SQAURE measures associations relative to their statistical significance, while XPASC-PPMI measures association strength more generally. We use XPASC to analyze KnowMAN, an adversarial architecture intended to control the degree of generalization from the labeling functions and thus to mitigate the problem of overfitting. On one hand, we show that KnowMAN is able to control the degree of generalization through a hyperparameter. On the other hand, results and qualitative analysis show that generalization and performance do not relate one-to-one, and that the highest degree of generalization does not necessarily imply the best performance. Therefore methods that allow for controlling the amount of generalization can achieve the right degree of benign overfitting. Our contributions in this study are i) the XPASC score to measure generalization in weakly-supervised models, ii) evaluation of XPASC across datasets and models and iii) the release of the XPASC implementation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题