论文标题
DeepVar:生物医学文献中基因组变异识别的端到端深度学习方法
DeepVar: An End-to-End Deep Learning Approach for Genomic Variant Recognition in Biomedical Literature
论文作者
论文摘要
我们考虑了在生物医学科学文献中命名实体识别(NER)的问题,更具体地说是这项工作中的基因组变体识别。近年来,在通常可以使用大型数据集的规范任务上,NER取得了重大成功。但是,在许多特定领域的领域,这仍然是一个具有挑战性的问题,尤其是只能获得小金注释的领域。此外,基因组变体实体表现出不同的语言异质性,与现有规范NER任务中的特征的语言异质性不同。此类任务中最新的机器学习方法在很大程度上依赖于艰苦的功能工程来表征这些独特的模式。在这项工作中,我们提出了第一种成功的端到端深度学习方法,该方法通过基因组变体识别来弥合通用算法和低资源应用之间的差距。我们提出的模型可以导致有希望的性能,而无需任何手工制作的功能或后处理规则。我们的广泛实验和结果可能会揭示其他类似的低资源NER应用。
We consider the problem of Named Entity Recognition (NER) on biomedical scientific literature, and more specifically the genomic variants recognition in this work. Significant success has been achieved for NER on canonical tasks in recent years where large data sets are generally available. However, it remains a challenging problem on many domain-specific areas, especially the domains where only small gold annotations can be obtained. In addition, genomic variant entities exhibit diverse linguistic heterogeneity, differing much from those that have been characterized in existing canonical NER tasks. The state-of-the-art machine learning approaches in such tasks heavily rely on arduous feature engineering to characterize those unique patterns. In this work, we present the first successful end-to-end deep learning approach to bridge the gap between generic NER algorithms and low-resource applications through genomic variants recognition. Our proposed model can result in promising performance without any hand-crafted features or post-processing rules. Our extensive experiments and results may shed light on other similar low-resource NER applications.