论文标题
从言语行为中推动人格检测:变压器符合心理语言特征的文本轮廓
Pushing on Personality Detection from Verbal Behavior: A Transformer Meets Text Contours of Psycholinguistic Features
论文作者
论文摘要
人格心理学,计算机科学和语言学交集的研究最近越来越集中于通过语言使用来建模和预测性格。我们报告了从文本数据中预测人格特征的两个重大改进:(1)对我们的知识,这是一组最全面的基于理论的心理语言特征和(2)混合模型,它们整合了预先训练的变压器语言模型BERT和双向长期短期记忆(BLSTM)网络,该网络训练了内部文本分布('Text Comportours of Issical contours'')的特征。我们尝试使用BLSTM模型(有或没有注意力),并采用两种技术,用于应用变压器模型中的预训练语言表示 - “基于功能”和“微调”。我们评估了我们在两个基准数据集上构建的模型的性能,这些数据集以两个主导的人格理论模型为目标:五大论文数据集和MBTI Kaggle数据集。我们的结果令人鼓舞,因为我们的模型在同一数据集上的现有工作优于现有工作。更具体地说,我们的模型在论文数据集上的分类准确性提高了2.9%,而Kaggle MBTI数据集则提高了8.28%。此外,我们进行消融实验,以量化各个人格预测模型中不同类别的心理语言特征的影响。
Research at the intersection of personality psychology, computer science, and linguistics has recently focused increasingly on modeling and predicting personality from language use. We report two major improvements in predicting personality traits from text data: (1) to our knowledge, the most comprehensive set of theory-based psycholinguistic features and (2) hybrid models that integrate a pre-trained Transformer Language Model BERT and Bidirectional Long Short-Term Memory (BLSTM) networks trained on within-text distributions ('text contours') of psycholinguistic features. We experiment with BLSTM models (with and without Attention) and with two techniques for applying pre-trained language representations from the transformer model - 'feature-based' and 'fine-tuning'. We evaluate the performance of the models we built on two benchmark datasets that target the two dominant theoretical models of personality: the Big Five Essay dataset and the MBTI Kaggle dataset. Our results are encouraging as our models outperform existing work on the same datasets. More specifically, our models achieve improvement in classification accuracy by 2.9% on the Essay dataset and 8.28% on the Kaggle MBTI dataset. In addition, we perform ablation experiments to quantify the impact of different categories of psycholinguistic features in the respective personality prediction models.