心理测试的项目作为Twitter用户的个性分析模型的培训数据

论文标题

心理测试的项目作为Twitter用户的个性分析模型的培训数据

Items from Psychometric Tests as Training Data for Personality Profiling Models of Twitter Users

论文作者

Kreuter, Anne, Sassenberg, Kai, Klinger, Roman

论文摘要

在社交媒体中进行作者分析的机器学习模型通常依赖于通过社交媒体用户填写的基于自我报告的心理测验（问卷）获得的数据。这是一种昂贵但准确的数据收集策略。另一个成本较低的替代方案，可能导致可能更加嘈杂和有偏见的数据，是依靠从用户配置文件中的公共可用信息（例如自我报告的诊断或测试结果）中推断出的标签。在本文中，我们探讨了第三种策略，即直接使用经过验证的心理测验中的项目作为培训数据。心理测验的项目通常由i-perspective的句子组成（例如，“我很容易结交朋友”）。这样的测试项目构成了“小数据”，但是它们对许多概念的可用性是丰富的资源。我们调查了这种人格谱分析的方法，并评估了BERT分类器在此类心理测试项目上进行了微调的五个人格特征（开放性，良心性，诚信，外向性，同意，神经质性）的评估，并分析了各种有关其潜在应对这么小的挑战的增强策略。我们对公开可用的Twitter语料库的评估表明，具有与基于T5的数据增强的4/5个性格特征的内域培训相当的表现。

Machine-learned models for author profiling in social media often rely on data acquired via self-reporting-based psychometric tests (questionnaires) filled out by social media users. This is an expensive but accurate data collection strategy. Another, less costly alternative, which leads to potentially more noisy and biased data, is to rely on labels inferred from publicly available information in the profiles of the users, for instance self-reported diagnoses or test results. In this paper, we explore a third strategy, namely to directly use a corpus of items from validated psychometric tests as training data. Items from psychometric tests often consist of sentences from an I-perspective (e.g., "I make friends easily."). Such corpora of test items constitute 'small data', but their availability for many concepts is a rich resource. We investigate this approach for personality profiling, and evaluate BERT classifiers fine-tuned on such psychometric test items for the big five personality traits (openness, conscientiousness, extraversion, agreeableness, neuroticism) and analyze various augmentation strategies regarding their potential to address the challenges coming with such a small corpus. Our evaluation on a publicly available Twitter corpus shows a comparable performance to in-domain training for 4/5 personality traits with T5-based data augmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题