论文标题
那个开放的nyght!跨时期的序列分割与开关内存
That Slepen Al the Nyght with Open Ye! Cross-era Sequence Segmentation with Switch-memory
论文作者
论文摘要
语言的演变遵循逐渐变化的规则。语法,词汇和词汇语义转移会随着时间的流逝而发生,从而导致语言语言差距。因此,大量文本是用不同时代的语言编写的,这为自然语言处理任务(例如单词分割和机器翻译)造成了障碍。尽管中文历史悠久,但以前的中国自然语言处理研究主要集中在特定时代的任务上。因此,我们为中文单词细分(CWS)提出了一个跨时代的学习框架,该框架使用Switch-Memory(SM)模块结合了ERA特定的语言知识。来自不同时代的四个语料库的实验表明,每个语料库的性能都显着提高。进一步的分析还表明,SM可以有效地将时代的知识整合到神经网络中。
The evolution of language follows the rule of gradual change. Grammar, vocabulary, and lexical semantic shifts take place over time, resulting in a diachronic linguistic gap. As such, a considerable amount of texts are written in languages of different eras, which creates obstacles for natural language processing tasks, such as word segmentation and machine translation. Although the Chinese language has a long history, previous Chinese natural language processing research has primarily focused on tasks within a specific era. Therefore, we propose a cross-era learning framework for Chinese word segmentation (CWS), CROSSWISE, which uses the Switch-memory (SM) module to incorporate era-specific linguistic knowledge. Experiments on four corpora from different eras show that the performance of each corpus significantly improves. Further analyses also demonstrate that the SM can effectively integrate the knowledge of the eras into the neural network.