论文标题
使用交互网络对用户行为进行建模以进行垃圾邮件检测
Modeling User Behavior With Interaction Networks for Spam Detection
论文作者
论文摘要
垃圾邮件是困扰网络规模的数字平台的一个严重问题,可促进用户内容创建和分发。它损害了平台的完整性,推荐和搜索等服务的性能以及整体业务。垃圾邮件发送者从事各种与非垃圾邮件发送者不同的虐待和回避行为。用户的复杂行为可以通过富含节点和边缘属性的异质图很好地表示。学会在网络尺度平台的图表中识别垃圾邮件发送者,因为其结构上的复杂性和大小,因此具有挑战性。在本文中,我们提出了塞纳河(使用相互作用网络检测垃圾邮件检测),这是一个新的图形框架上的垃圾邮件检测模型。我们的图形同时捕获了丰富的用户的详细信息和行为,并可以在十亿个尺度的图表上学习。我们的模型考虑了邻域以及边缘类型和属性,从而使其可以捕获各种垃圾邮件发送者。塞纳河(Seine)在数千万节点和数十亿个边缘的真实数据集中受过培训,以1%的假阳性率达到了80%的召回率。塞纳河(Seine)在公共数据集上的最先进技术实现了可比的性能,同时务实可用于大规模生产系统。
Spam is a serious problem plaguing web-scale digital platforms which facilitate user content creation and distribution. It compromises platform's integrity, performance of services like recommendation and search, and overall business. Spammers engage in a variety of abusive and evasive behavior which are distinct from non-spammers. Users' complex behavior can be well represented by a heterogeneous graph rich with node and edge attributes. Learning to identify spammers in such a graph for a web-scale platform is challenging because of its structural complexity and size. In this paper, we propose SEINE (Spam DEtection using Interaction NEtworks), a spam detection model over a novel graph framework. Our graph simultaneously captures rich users' details and behavior and enables learning on a billion-scale graph. Our model considers neighborhood along with edge types and attributes, allowing it to capture a wide range of spammers. SEINE, trained on a real dataset of tens of millions of nodes and billions of edges, achieves a high performance of 80% recall with 1% false positive rate. SEINE achieves comparable performance to the state-of-the-art techniques on a public dataset while being pragmatic to be used in a large-scale production system.