土耳其媒体中新闻消费的无监督行为分析

论文标题

土耳其媒体中新闻消费的无监督行为分析

Unsupervised Behaviour Analysis of News Consumption in Turkish Media

论文作者

Makaroglu, Didem, Cakir, Altan, Toreyin, Behcet Ugur

论文摘要

ClickStream数据包含由人类活动在网站上产生的大量卷，已成为新闻媒体数字化后通过新闻编辑室识别读者特征的重要功能。尽管ClickStream数据的性质在网站中具有类似的逻辑，但它在从广义的角度看待人类行为方面具有固有的局限性，这使得需要限制利基领域的问题。这项研究调查了匿名读者对组织网站上的点击活动，以识别Twitter推荐后的新闻消费模式，这些新闻消费模式偶然到达，但倾向主要是路由新闻内容。使用混合型嵌入策略的合奏集群分析方法，并将其比较以找到与时间无关的类似读取器组和兴趣。各种内部验证视角用于确定集群质量的最佳性，其中发现Calinski Harabasz索引（CHI）给出了可概括的结果。我们的发现表明，聚类混合型数据集方法方法是最佳的内部验证分数，我们定义以区分群集和算法时考虑应用策略的群集和算法，这些策略嵌入了统一的流形近似值和投影（UMAP），并使用最适合的consuse consuse consuse consuse sustruly suffemers suffence insperions suffence insed shiperement sustmys insed in shyphorement insperulation。对所得群集的评估通过调整后的相互信息得分大于0.5，突出了分离的每月样本中反复出现的特定簇，这为新闻机构提供了见解，并克服了由于兴趣随着时间的推移而变化而导致的建模行为的退化。

Clickstream data, which come with a massive volume generated by human activities on websites, have become a prominent feature for identifying readers' characteristics by newsrooms after the digitization of news outlets. Although the nature of clickstream data has a similar logic within websites, it has inherent limitations in recognizing human behaviours when looking from a broad perspective, which brings the need to limit the problem in niche areas. This study investigates the anonymized readers' click activities on the organizations' websites to identify news consumption patterns following referrals from Twitter,who incidentally reach but propensity is mainly routed news content. Methodologies for ensemble cluster analysis with mixed-type embedding strategies are applied and compared to find similar reader groups and interests independent of time. Various internal validation perspectives are used to determine the optimality of the quality of clusters, where the Calinski Harabasz Index (CHI) is found to give a generalizable result. Our findings demonstrate that clustering a mixed-type dataset approaches the optimal internal validation scores, which we define to discriminate the clusters and algorithms considering applied strategies when embedded by Uniform Manifold Approximation and Projection (UMAP) and using a consensus function as a key to access the most applicable hyperparameter configurations in the given ensemble rather than using consensus function results directly. Evaluation of the resulting clusters highlights specific clusters repeatedly present in the separated monthly samples by Adjusted Mutual Information scores greater than 0.5, which provide insights to the news organizations and overcome the degradation of the modeling behaviours due to the change in the interest over time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题