论文标题
中国医学阅读理解的知识授权代表性学习:任务,模型和资源
Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources
论文作者
论文摘要
机器阅读理解(MRC)旨在提取有关段落的问题的答案。最近对它进行了广泛的研究,尤其是在开放型域中。但是,主要是由于缺乏大规模培训数据,几乎没有努力闭合域MRC。在本文中,我们为医疗领域介绍了多目标MRC任务,其目标是同时预测医疗问题的答案以及医疗信息来源的相应支持句子,以确保医学知识服务的高可靠性。为此目的手动构建高质量的数据集,称为多任务中国医学MRC数据集(CMEDMRC),并进行了详细的分析。我们进一步提出了该任务的中国医学BERT模型(CMedbert),该模型通过异质特征的动态融合机制和多任务学习策略将医学知识融合到了预训练的语言模型中。实验表明,Cmedbert始终通过融合上下文感知和知识吸引的令牌表示来胜过强大的基线。
Machine Reading Comprehension (MRC) aims to extract answers to questions given a passage. It has been widely studied recently, especially in open domains. However, few efforts have been made on closed-domain MRC, mainly due to the lack of large-scale training data. In this paper, we introduce a multi-target MRC task for the medical domain, whose goal is to predict answers to medical questions and the corresponding support sentences from medical information sources simultaneously, in order to ensure the high reliability of medical knowledge serving. A high-quality dataset is manually constructed for the purpose, named Multi-task Chinese Medical MRC dataset (CMedMRC), with detailed analysis conducted. We further propose the Chinese medical BERT model for the task (CMedBERT), which fuses medical knowledge into pre-trained language models by the dynamic fusion mechanism of heterogeneous features and the multi-task learning strategy. Experiments show that CMedBERT consistently outperforms strong baselines by fusing context-aware and knowledge-aware token representations.