使用双向长期短期记忆神经网络和单词嵌入的Twitter机器人检测

论文标题

使用双向长期短期记忆神经网络和单词嵌入的Twitter机器人检测

Twitter Bot Detection Using Bidirectional Long Short-term Memory Neural Networks and Word Embeddings

论文作者

Wei, Feng, Nguyen, Uyen Trang

论文摘要

Twitter是一个网络应用程序，扮演在线社交网络和微博的双重角色。 Twitter的受欢迎程度和开放结构吸引了大量自动化程序，称为机器人。合法的机器人会产生大量的良性上下文内容，即推文传递新闻和更新提要，而恶意机器人则传播垃圾邮件或恶意内容。为了帮助人类用户确定与谁互动的人，本文通过采用经常性的神经网络，特别是双向长期短期记忆（BILSTM），重点介绍了在Twitter上的人类和Spambot帐户的分类，以有效地捕获跨推文的功能。据我们所知，我们的工作是第一个开发带有单词嵌入的经常性神经模型的工作，以将Twitter机器人与人类帐户区分开，这不需要关于用户的概况，友谊网络或目标帐户上的历史行为的先验知识或假设。此外，我们的模型不需要任何手工制作的功能。初步仿真结果非常令人鼓舞。 Cresci-2017数据集的实验表明，与现有最新的机器人检测系统相比，我们的方法可以实现竞争性能。

Twitter is a web application playing dual roles of online social networking and micro-blogging. The popularity and open structure of Twitter have attracted a large number of automated programs, known as bots. Legitimate bots generate a large amount of benign contextual content, i.e., tweets delivering news and updating feeds, while malicious bots spread spam or malicious contents. To assist human users in identifying who they are interacting with, this paper focuses on the classification of human and spambot accounts on Twitter, by employing recurrent neural networks, specifically bidirectional Long Short-term Memory (BiLSTM), to efficiently capture features across tweets. To the best of our knowledge, our work is the first that develops a recurrent neural model with word embeddings to distinguish Twitter bots from human accounts, that requires no prior knowledge or assumption about users' profiles, friendship networks, or historical behavior on the target account. Moreover, our model does not require any handcrafted features. The preliminary simulation results are very encouraging. Experiments on the cresci-2017 dataset show that our approach can achieve competitive performance compared with existing state-of-the-art bot detection systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题