逗乐：多模式社交媒体数据的注释框架

论文标题

逗乐：多模式社交媒体数据的注释框架

AMUSED: An Annotation Framework of Multi-modal Social Media Data

论文作者

Shahi, Gautam Kishore

论文摘要

在本文中，我们提出了一个半自动化的框架，称为“娱乐”，用于从多个社交媒体平台收集多模式注释的数据。该框架旨在通过在数据收集过程中结合机器和人类来减轻收集和注释社交媒体数据的问题。从专业新闻媒体或博客的给定文章列表中，有趣的是检测到与新闻文章的社交媒体帖子的链接，然后从各个社交媒体平台下载同一帖子的内容，以收集有关该特定帖子的详细信息。该框架能够从Twitter，YouTube，Reddit等多个平台中获取带注释的数据。该框架旨在减少社交媒体平台的数据注释背后的工作量和问题。有趣的是，可以在多个应用程序域中应用，作为用例，我们已经实施了从不同社交媒体平台收集COVID-19错误信息数据的框架。

In this paper, we present a semi-automated framework called AMUSED for gathering multi-modal annotated data from the multiple social media platforms. The framework is designed to mitigate the issues of collecting and annotating social media data by cohesively combining machine and human in the data collection process. From a given list of the articles from professional news media or blog, AMUSED detects links to the social media posts from news articles and then downloads contents of the same post from the respective social media platform to gather details about that specific post. The framework is capable of fetching the annotated data from multiple platforms like Twitter, YouTube, Reddit. The framework aims to reduce the workload and problems behind the data annotation from the social media platforms. AMUSED can be applied in multiple application domains, as a use case, we have implemented the framework for collecting COVID-19 misinformation data from different social media platforms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题