论文标题
PTUM:通过自学的未标记用户行为的预培训用户模型
PTUM: Pre-training User Model from Unlabeled User Behaviors via Self-supervision
论文作者
论文摘要
用户建模对于许多个性化的Web服务至关重要。许多现有方法根据用户的行为和目标任务数据的标记数据来对其进行建模。但是,这些方法无法利用未标记的用户行为数据中的有用信息,当稀缺标记的数据时,它们的性能可能并不是最佳的。以预先训练的语言模型进行的,这些模型在大规模未标记的语料库上进行了预先训练,以增强许多下游任务的能力,在本文中,我们建议从大型未标记的用户行为数据中预先培训用户模型。我们为用户模型预训练提出了两个自学任务。第一个是掩盖行为预测,可以模拟历史行为之间的相关性。第二个是下一个$ K $行为预测,它可以对过去和将来行为之间的相关性进行建模。预先训练的用户模型在下游任务中进行了填补,以学习特定于任务的用户表示。两个现实世界数据集的实验结果验证了我们提出的用户模型预训练方法的有效性。
User modeling is critical for many personalized web services. Many existing methods model users based on their behaviors and the labeled data of target tasks. However, these methods cannot exploit useful information in unlabeled user behavior data, and their performance may be not optimal when labeled data is scarce. Motivated by pre-trained language models which are pre-trained on large-scale unlabeled corpus to empower many downstream tasks, in this paper we propose to pre-train user models from large-scale unlabeled user behaviors data. We propose two self-supervision tasks for user model pre-training. The first one is masked behavior prediction, which can model the relatedness between historical behaviors. The second one is next $K$ behavior prediction, which can model the relatedness between past and future behaviors. The pre-trained user models are finetuned in downstream tasks to learn task-specific user representations. Experimental results on two real-world datasets validate the effectiveness of our proposed user model pre-training method.