采用基于进化的自然语言处理方法

论文标题

采用基于进化的自然语言处理方法

Towards an evolutionary-based approach for natural language processing

论文作者

Manzoni, Luca, Jakobovic, Domagoj, Mariot, Luca, Picek, Stjepan, Castelli, Mauro

论文摘要

与自然语言处理有关的任务（NLP）最近已成为机器学习社区一项大型研究的重点。对这一领域的兴趣增加主要是由于深度学习方法的成功。然而，基因编程（GP）在NLP任务方面并不受到关注。在这里，我们提出了第一个概念验证，将GP与已建立的NLP工具Word2Vec结合在一起，用于下一个单词的预测任务。主要思想是，一旦单词被移至向量空间，传统的GP操作员就可以成功地在向量上工作，从而产生有意义的单词作为输出。为了评估这种方法的适用性，我们对一组现有报纸头条进行了实验评估。从这个（前）培训阶段产生的个人可以用作其他NLP任务中的初始人群，例如句子产生，这将是未来研究的重点，可能采用了对抗性的共同进化方法。

Tasks related to Natural Language Processing (NLP) have recently been the focus of a large research endeavor by the machine learning community. The increased interest in this area is mainly due to the success of deep learning methods. Genetic Programming (GP), however, was not under the spotlight with respect to NLP tasks. Here, we propose a first proof-of-concept that combines GP with the well established NLP tool word2vec for the next word prediction task. The main idea is that, once words have been moved into a vector space, traditional GP operators can successfully work on vectors, thus producing meaningful words as the output. To assess the suitability of this approach, we perform an experimental evaluation on a set of existing newspaper headlines. Individuals resulting from this (pre-)training phase can be employed as the initial population in other NLP tasks, like sentence generation, which will be the focus of future investigations, possibly employing adversarial co-evolutionary approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题