Bagformer：通过Bagise互动更好的跨模式检索

论文标题

Bagformer：通过Bagise互动更好的跨模式检索

BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

论文作者

Hou, Haowen, Yan, Xiaopeng, Zhang, Yigeng, Lian, Fengzong, Kang, Zhanhui

论文摘要

在跨模式检索的领域，单个编码器模型的性能往往比双重编码器模型更好，但是它们的潜伏期高和低吞吐量。在本文中，我们提出了一个称为Bagformer的双编码器模型，该模型利用交叉模态相互作用机制来改善召回性能而无需牺牲延迟和吞吐量。 Bagformer通过使用袋装的相互作用来实现这一目标，从而使文本转换为更合适的粒度，并将实体知识纳入模型。我们的实验表明，Bagformer能够在跨模式检索任务中获得与最先进的单一编码器模型相当的结果，同时还提供有效的训练和推理，延迟较低20.72倍，吞吐量较高25.74倍。

In the field of cross-modal retrieval, single encoder models tend to perform better than dual encoder models, but they suffer from high latency and low throughput. In this paper, we present a dual encoder model called BagFormer that utilizes a cross modal interaction mechanism to improve recall performance without sacrificing latency and throughput. BagFormer achieves this through the use of bag-wise interactions, which allow for the transformation of text to a more appropriate granularity and the incorporation of entity knowledge into the model. Our experiments demonstrate that BagFormer is able to achieve results comparable to state-of-the-art single encoder models in cross-modal retrieval tasks, while also offering efficient training and inference with 20.72 times lower latency and 25.74 times higher throughput.

下载PDF全文

下载文献需遵守相关版权规定

论文标题