论文标题
部分可观测时空混沌系统的无模型预测
Delving Deep into Regularity: A Simple but Effective Method for Chinese Named Entity Recognition
论文作者
论文摘要
近年来,从提出新框架或纳入词典中的中国名称实体识别(NER)的表现不断提高。但是,很少研究实体的内部组成。实际上,大多数普通类型的提及都具有很强的名称。例如,实体以“公司”或“银行”之类的指示词结尾,通常属于组织。在本文中,我们提出了一种简单但有效的方法,用于调查中文NER的实体跨度的规律性,被称为规则性启发性识别网络(RICON)。具体而言,所提出的模型由两个分支组成:一个规则性感知模块和一个规则性磁性模块。规则性感知的模块捕获了每个跨度的内部规则性,以进行更好的实体类型预测,而规则性不足的模块则用于定位实体的边界,并减轻过度注意的跨度规律性。进一步构建正交性空间,以鼓励两个模块提取规律性特征的不同方面。为了验证我们方法的有效性,我们在三个基准数据集和一个实用的医疗数据集上进行了广泛的实验。实验结果表明,我们的RICON明显优于先前的最新方法,包括各种基于词典的方法。
Recent years have witnessed the improving performance of Chinese Named Entity Recognition (NER) from proposing new frameworks or incorporating word lexicons. However, the inner composition of entity mentions in character-level Chinese NER has been rarely studied. Actually, most mentions of regular types have strong name regularity. For example, entities end with indicator words such as "company" or "bank" usually belong to organization. In this paper, we propose a simple but effective method for investigating the regularity of entity spans in Chinese NER, dubbed as Regularity-Inspired reCOgnition Network (RICON). Specifically, the proposed model consists of two branches: a regularity-aware module and a regularityagnostic module. The regularity-aware module captures the internal regularity of each span for better entity type prediction, while the regularity-agnostic module is employed to locate the boundary of entities and relieve the excessive attention to span regularity. An orthogonality space is further constructed to encourage two modules to extract different aspects of regularity features. To verify the effectiveness of our method, we conduct extensive experiments on three benchmark datasets and a practical medical dataset. The experimental results show that our RICON significantly outperforms previous state-of-the-art methods, including various lexicon-based methods.