非结构化文本的设备标签生成

论文标题

非结构化文本的设备标签生成

On-Device Tag Generation for Unstructured Text

论文作者

Chugani, Manish, Vatsal, Shubham, Ramena, Gopi, Moharana, Sukumar, Purre, Naresh

论文摘要

随着向智能手机的压倒性过渡，以非结构化文本的形式存储重要信息已成为移动设备用户的习惯。从杂货列表到电子邮件和重要的演讲草案，用户以非结构化文本的形式（例如：在注释应用程序中）在其设备上存储了许多数据，从而导致数据混乱。这不仅可以阻止用户在应用程序中有效导航，而且还阻止了他们意识到这些应用程序中可能存在的关系。本文提出了一条新颖的管道，使用世界知识基于非结构化文本数据中存在的关键字和概念来生成一组标签。然后，这些标签可用于总结，分类或搜索所需的信息，从而通过允许它们对以非结构化文本形式存储的信息的整体前景来增强用户体验。在拟议的系统中，我们使用带有修剪概念网络资源的设备（手机）有效的CNN模型来实现我们的目标。该体系结构还提出了一种新颖的排名算法，可以从任何给定文本中提取顶部n个标签。

With the overwhelming transition to smart phones, storing important information in the form of unstructured text has become habitual to users of mobile devices. From grocery lists to drafts of emails and important speeches, users store a lot of data in the form of unstructured text (for eg: in the Notes application) on their devices, leading to cluttering of data. This not only prevents users from efficient navigation in the applications but also precludes them from perceiving the relations that could be present across data in those applications. This paper proposes a novel pipeline to generate a set of tags using world knowledge based on the keywords and concepts present in unstructured textual data. These tags can then be used to summarize, categorize or search for the desired information thus enhancing user experience by allowing them to have a holistic outlook of the kind of information stored in the form of unstructured text. In the proposed system, we use an on-device (mobile phone) efficient CNN model with pruned ConceptNet resource to achieve our goal. The architecture also presents a novel ranking algorithm to extract the top n tags from any given text.

下载PDF全文

下载文献需遵守相关版权规定

论文标题