增强的文本分类，以探索基于健康的印度政府政策推文

论文标题

增强的文本分类，以探索基于健康的印度政府政策推文

An Enhanced Text Classification to Explore Health based Indian Government Policy Tweets

论文作者

Dhiman, Aarzoo, Toshniwal, Durga

论文摘要

政府赞助的政策制定和计划世代是保护和促进公民的社会，经济和个人发展的手段之一。政府所做的这些计划的有效性的评估仅提供了事实和数字方面的统计信息，这些信息不包括公众对该主题的看法，经验和观点的深入了解。在这项研究工作中，我们提出了一个改进的文本分类框架，该框架对不同基于健康的政府计划的Twitter数据进行了分类。提出的框架利用语言表示模型（LR模型）Bert，Elmo和使用。但是，由于足够的注释数据的稀缺性，这些LR模型的实时适用性较小。为了解决这个问题，我们提出了一个新颖的手套单词嵌入和基于班级的观点的文本增强方法（名为Mod-eda），该方法通过增加标记数据的大小来提高文本分类任务的性能。此外，训练有素的模型可以确定公民参与不同社区的政策，例如中等收入和低收入群体。

Government-sponsored policy-making and scheme generations is one of the means of protecting and promoting the social, economic, and personal development of the citizens. The evaluation of effectiveness of these schemes done by government only provide the statistical information in terms of facts and figures which do not include the in-depth knowledge of public perceptions, experiences and views on the topic. In this research work, we propose an improved text classification framework that classifies the Twitter data of different health-based government schemes. The proposed framework leverages the language representation models (LR models) BERT, ELMO, and USE. However, these LR models have less real-time applicability due to the scarcity of the ample annotated data. To handle this, we propose a novel GloVe word embeddings and class-specific sentiments based text augmentation approach (named Mod-EDA) which boosts the performance of text classification task by increasing the size of labeled data. Furthermore, the trained model is leveraged to identify the level of engagement of citizens towards these policies in different communities such as middle-income and low-income groups.

下载PDF全文

下载文献需遵守相关版权规定

论文标题