论文标题
数据集创建和经验分析,用于从社交媒体帖子中检测抑郁症的迹象
Data set creation and empirical analysis for detecting signs of depression from social media postings
论文作者
论文摘要
抑郁症是一种常见的精神疾病,必须在早期进行检测和治疗,以避免严重的后果。检测抑郁症的方法和方式涉及个人进行体格检查。但是,使用其社交媒体数据诊断心理健康更为有效,因为它避免了这种体格检查。此外,人们在社交媒体上表达自己的情绪,希望使用社交媒体数据诊断他们的心理健康。尽管有许多现有系统通过分析其社交媒体数据来检测一个人的精神疾病,但检测抑郁水平对于进一步的治疗也很重要。因此,在这项研究中,我们开发了一个黄金标准数据集,该数据集将抑郁症的水平视为“不抑郁”,“中度抑郁”和“从社交媒体帖子中严重沮丧”。在此数据集中采用了传统的学习算法,并在本文中介绍了经验分析。应用数据增强技术来克服数据不平衡。在实现的几种变体中,具有Word2Vec矢量器和随机森林分类器的模型在增强数据上的模型优于其他变体,而精度和F1度量的分数为0.877。
Depression is a common mental illness that has to be detected and treated at an early stage to avoid serious consequences. There are many methods and modalities for detecting depression that involves physical examination of the individual. However, diagnosing mental health using their social media data is more effective as it avoids such physical examinations. Also, people express their emotions well in social media, it is desirable to diagnose their mental health using social media data. Though there are many existing systems that detects mental illness of a person by analysing their social media data, detecting the level of depression is also important for further treatment. Thus, in this research, we developed a gold standard data set that detects the levels of depression as `not depressed', `moderately depressed' and `severely depressed' from the social media postings. Traditional learning algorithms were employed on this data set and an empirical analysis was presented in this paper. Data augmentation technique was applied to overcome the data imbalance. Among the several variations that are implemented, the model with Word2Vec vectorizer and Random Forest classifier on augmented data outperforms the other variations with a score of 0.877 for both accuracy and F1 measure.