论文标题
联邦设置中的遥远监督关系提取
Distantly Supervised Relation Extraction in Federated Settings
论文作者
论文摘要
本文研究了联邦设置中遥远的监督关系提取。先前的研究集中于在集中式培训的假设下进行遥远的监督,这需要从不同平台收集文本并将其存储在一台机器上。但是,集中式培训受到两个问题的挑战,即数据障碍和隐私保护,这使得从多个平台中集中数据几乎不可能或成本良好。因此,值得调查联合学习范式中的遥远监督,该范式将模型培训与直接访问原始数据的需求解除。但是,在联邦设置中,克服遥远监督的标签噪声变得更加困难,因为包含相同实体对的句子可能会散布在不同的平台周围。在本文中,我们提出了一个联合的denoising框架,以抑制联邦设置中的标签噪声。该框架的核心是一种基于多个实例学习的剥离方法,能够通过跨平台协作选择可靠的实例。 《纽约时报》数据集和miRNA基因调节关系数据集的各种实验结果证明了该方法的有效性。
This paper investigates distantly supervised relation extraction in federated settings. Previous studies focus on distant supervision under the assumption of centralized training, which requires collecting texts from different platforms and storing them on one machine. However, centralized training is challenged by two issues, namely, data barriers and privacy protection, which make it almost impossible or cost-prohibitive to centralize data from multiple platforms. Therefore, it is worthy to investigate distant supervision in the federated learning paradigm, which decouples the model training from the need for direct access to the raw data. Overcoming label noise of distant supervision, however, becomes more difficult in federated settings, since the sentences containing the same entity pair may scatter around different platforms. In this paper, we propose a federated denoising framework to suppress label noise in federated settings. The core of this framework is a multiple instance learning based denoising method that is able to select reliable instances via cross-platform collaboration. Various experimental results on New York Times dataset and miRNA gene regulation relation dataset demonstrate the effectiveness of the proposed method.