图像可以帮助识别实体吗？对多模式NER图像作用的研究

论文标题

图像可以帮助识别实体吗？对多模式NER图像作用的研究

Can images help recognize entities? A study of the role of images for Multimodal NER

论文作者

Chen, Shuguang, Aguilar, Gustavo, Neves, Leonardo, Solorio, Thamar

论文摘要

多模式命名实体识别（MNER）需要弥合语言理解和视觉上下文之间的差距。尽管已经提出了许多多模式神经技术来将图像纳入MNER任务，但该模型利用多模式相互作用的能力仍然很少了解。在这项工作中，我们从不同的角度对现有多模式融合技术进行了深入的分析，并描述了从图像中添加信息并不总是会提高性能的场景。我们还研究使用字幕作为丰富Mner上下文的一种方式。来自流行社交平台的三个数据集的实验暴露了现有的多模式模型的瓶颈以及使用字幕的情况是有益的。

Multimodal named entity recognition (MNER) requires to bridge the gap between language understanding and visual context. While many multimodal neural techniques have been proposed to incorporate images into the MNER task, the model's ability to leverage multimodal interactions remains poorly understood. In this work, we conduct in-depth analyses of existing multimodal fusion techniques from different perspectives and describe the scenarios where adding information from the image does not always boost performance. We also study the use of captions as a way to enrich the context for MNER. Experiments on three datasets from popular social platforms expose the bottleneck of existing multimodal models and the situations where using captions is beneficial.

下载PDF全文

下载文献需遵守相关版权规定

论文标题