Viralbert：一种以用户为本的基于BERT的病毒性预测方法

论文标题

Viralbert：一种以用户为本的基于BERT的病毒性预测方法

ViralBERT: A User Focused BERT-Based Approach to Virality Prediction

论文作者

Rameez, Rikaz, Rahmani, Hossein A., Yilmaz, Emine

论文摘要

最近，Twitter已成为通过称为“ Tweets”的帖子将信息共享和传播信息共享和传播信息的首选社交网络。用户可以通过“转发”轻松地向其他用户重新分享这些帖子，这些帖子允许信息级联向更多用户级联，从而增加其外展。显然，能够知道可以转发邮政的程度在广告，影响和其他此类活动中具有巨大价值。在本文中，我们提出了Viralbert，可用于使用基于内容和用户的功能来预测推文的病毒性。我们采用了一种串联数字特征的方法，例如主题标签和追随者编号来推文文本，并使用两个BERT模块：一种用于组合文本和数值特征的语义表示，而另一个模块纯粹用于文本的情感分析，因为文本中的信息既可以在文本中及其在情感响应中均具有一部分的能力，可以在enteweet上发挥着一部分。我们收集了一个330K推文的数据集来训练Viralbert，并使用该领域当前研究的基准验证了我们的模型的功效。我们的实验表明，我们的方法的表现优于这些基准，与最佳性能基线方法相比，F1得分和准确性提高了13％。然后，我们进行了一项消融研究，以研究所选特征的重要性，发现文本情感和追随者计数以及在较小程度上提到和遵循计数，是该模型的最强特征，并且标签计数对模型有害。

Recently, Twitter has become the social network of choice for sharing and spreading information to a multitude of users through posts called 'tweets'. Users can easily re-share these posts to other users through 'retweets', which allow information to cascade to many more users, increasing its outreach. Clearly, being able to know the extent to which a post can be retweeted has great value in advertising, influencing and other such campaigns. In this paper we propose ViralBERT, which can be used to predict the virality of tweets using content- and user-based features. We employ a method of concatenating numerical features such as hashtags and follower numbers to tweet text, and utilise two BERT modules: one for semantic representation of the combined text and numerical features, and another module purely for sentiment analysis of text, as both the information within text and it's ability to elicit an emotional response play a part in retweet proneness. We collect a dataset of 330k tweets to train ViralBERT and validate the efficacy of our model using baselines from current studies in this field. Our experiments show that our approach outperforms these baselines, with a 13% increase in both F1 Score and Accuracy compared to the best performing baseline method. We then undergo an ablation study to investigate the importance of chosen features, finding that text sentiment and follower counts, and to a lesser extent mentions and following counts, are the strongest features for the model, and that hashtag counts are detrimental to the model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题