改善孟加拉国假新闻检测性能的方法：不平衡处理和模型堆叠

论文标题

改善孟加拉国假新闻检测性能的方法：不平衡处理和模型堆叠

Approaches for Improving the Performance of Fake News Detection in Bangla: Imbalance Handling and Model Stacking

论文作者

Hossain, Md Muzakker, Awosaf, Zahin, Prottoy, Md. Salman Hossan, Alvy, Abu Saleh Muhammod, Morol, Md. Kishor

论文摘要

不平衡的数据集可能会导致对虚假新闻的检测有偏见。在这项工作中，我们提出了几种策略，以通过对拟议方法的比较评估来解决孟加拉的虚假新闻检测问题。此外，即使数据集不平衡，我们也提出了一种改善性能的技术。我们将建议的方法应用于Banfakenews，这是一个数据集，目的是在孟加拉国中发现假新闻，包括50k实例，但大大偏斜，占多数实例的97％。我们使用数据操纵技术（例如SMOTE）获得了93.1％的F1评分，并使用没有数据操纵方法（例如堆叠的概括）使用了79.1％的F1得分。没有实施这些技术，基线模型的F1得分将为67.6％。我们认为这项工作是迈向孟加拉虚假新闻发现方式的重要一步。通过实施这些策略，可以消除不平衡数据集的障碍，并可以提高性能。

Imbalanced datasets can lead to biasedness into the detection of fake news. In this work, we present several strategies for resolving the imbalance issue for fake news detection in Bangla with a comparative assessment of proposed methodologies. Additionally, we propose a technique for improving performance even when the dataset is imbalanced. We applied our proposed approaches to BanFakeNews, a dataset developed for the purpose of detecting fake news in Bangla comprising of 50K instances but is significantly skewed, with 97% of majority instances. We obtained a 93.1% F1-score using data manipulation manipulation techniques such as SMOTE, and a 79.1% F1-score using without data manipulation approaches such as Stacked Generalization. Without implementing these techniques, the F1-score would have been 67.6% for baseline models. We see this work as an important step towards paving the way of fake news detection in Bangla. By implementing these strategies the obstacles of imbalanced dataset can be removed and improvement in the performance can be achieved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题