论文标题
通过心理特质评分改善网络钓鱼检测
Improving Phishing Detection Via Psychological Trait Scoring
论文作者
论文摘要
网络钓鱼电子邮件表现出一些独特的心理特征,这些特征在合法电子邮件中不存在。从经验分析和先前的研究中,我们发现了三种心理特征在网络钓鱼电子邮件中最占主导地位 - 紧迫感,通过威胁来引起恐惧以及渴望的诱惑。我们在培训数据集中手动将所有网络钓鱼电子邮件的10%标记为这三个特征。我们通过培训Bert,Sent-Bert(Sbert)和角色级别CNN模型来利用这些知识,并通过形成网络钓鱼心理特质(PPT)得分的最后一层捕获细微差别。对于网络钓鱼电子邮件检测任务,我们使用验证的BERT和SBERT模型,并加入PPT分数以进食完全连接的神经网络模型。我们的结果表明,PPT分数的增加可显着提高模型性能,从而表明PPT得分在捕获心理细微差别方面的有效性。此外,为了减轻不平衡培训数据集的效果,我们使用GPT-2模型生成网络钓鱼电子邮件(Radford等,2019)。我们的最佳模型优于当前最新模型(SOTA)模型的F1分数4.54%。此外,我们对单个PPT的分析表明,恐惧为检测网络钓鱼电子邮件提供了最强的提示。
Phishing emails exhibit some unique psychological traits which are not present in legitimate emails. From empirical analysis and previous research, we find three psychological traits most dominant in Phishing emails - A Sense of Urgency, Inducing Fear by Threatening, and Enticement with Desire. We manually label 10% of all phishing emails in our training dataset for these three traits. We leverage that knowledge by training BERT, Sentence-BERT (SBERT), and Character-level-CNN models and capturing the nuances via the last layers that form the Phishing Psychological Trait (PPT) scores. For the phishing email detection task, we use the pretrained BERT and SBERT model, and concatenate the PPT scores to feed into a fully-connected neural network model. Our results show that the addition of PPT scores improves the model performance significantly, thus indicating the effectiveness of PPT scores in capturing the psychological nuances. Furthermore, to mitigate the effect of the imbalanced training dataset, we use the GPT-2 model to generate phishing emails (Radford et al., 2019). Our best model outperforms the current State-of-the-Art (SOTA) model's F1 score by 4.54%. Additionally, our analysis of individual PPTs suggests that Fear provides the strongest cue in detecting phishing emails.