论文标题
通过基于订阅的渠道嵌入了解YouTube社区
Understanding YouTube Communities via Subscription-based Channel Embeddings
论文作者
论文摘要
YouTube是全球新闻和娱乐的重要来源,但规模使研究平台上讨论的想法和主题具有挑战性。本文介绍了发现和分类YouTube渠道的新方法,该方法可以使用比以前的研究中使用的数量级多渠道对平台上的社区和类别进行分析。这些方法没有像其他研究人员那样使用频道和视频数据作为分类的特征,而是使用一种自我监督的学习方法来利用评论员的公共订阅页面。我们在预测YouTube新闻渠道的政治精益的任务上测试了分类方法,并发现它表现优于以前的任务最佳模型。进一步的实验还表明,使用评论员订阅发现渠道有重要的优势。订阅数据以及一种迭代方法,用于发现我们当前的理解,这是尚待分析的最全面的英语社会政治YouTube渠道。我们尝试使用先前注释的数据集预测渠道的更细粒度的政治标签,并发现我们的模型在大多数顶部标签中的表现要比普通个人审稿人的表现更好。然后将这种细粒的政治标签模型应用于新发现的英语社会政治渠道,以创建一个新的数据集,以分析用于不同政治内容的流量。数据表明,在仅查看最流行的社会政治渠道时,某些标签(例如“党派权利”和“阴谋”)被大大表示。通过使用我们的方法,我们能够更准确地了解YouTube上这些社区的大小。
YouTube is an important source of news and entertainment worldwide, but the scale makes it challenging to study the ideas and topics being discussed on the platform. This paper presents new methods to discover and classify YouTube channels which enable the analysis of communities and categories on the platform using orders of magnitude more channels than have been used in previous studies. Instead of using channel and video data as features for classification as other researchers have, these methods use a self-supervised learning approach that leverages the public subscription pages of commenters. We test the classification method on the task of predicting the political lean of YouTube news channels and find that it outperforms the previous best model on the task. Further experiments also show that there are important advantages to using commenter subscriptions to discover channels. The subscription data, along with an iterative approach, is applied to discover, to our current understanding, the most comprehensive set of English language socio-political YouTube channels yet to be analyzed. We experiment with predicting more fine grained political tags for channels using a previously annotated dataset and find that our model performs better than the average individual human reviewer for most of the top tags. This fine grained political tag model is then applied to the newly discovered English language socio-political channels to create a new dataset to analyze the amount of traffic going to different political content. The data shows that some tags, such as "Partisan Right" and "Conspiracy", are significantly under represented when looking only at the most popular socio-political channels. Through the use of our methods, we are able to get a much more accurate picture of the size of these communities on YouTube.