论文标题

通过递归的子查询构造改善一阶段的视觉接地

Improving One-stage Visual Grounding by Recursive Sub-query Construction

论文作者

Yang, Zhengyuan, Chen, Tianlang, Wang, Liwei, Luo, Jiebo

论文摘要

我们通过解决当前对长而复杂的查询的局限性来改善一阶段的视觉接地。现有的一阶段方法将整个语言查询编码为一个嵌入向量的单个句子,例如,从BERT或LSTM中删除隐藏状态。该单个向量表示很容易忽略查询中的详细说明。为了解决此查询建模缺陷,我们提出了一个递归的子查询构建框架,该框架的原因是多个回合的图像和查询之间,并逐步减少引用歧义。我们显示我们的新单阶段方法分别获得了5.0%,4.5%,7.5%,12.8%的绝对改进,比最新的一阶段基线,Refcoco,Refcoco+和Refcocog分别获得了一个阶段的基线。特别是,在更长,更复杂的查询上的出色性能验证了我们查询建模的有效性。

We improve one-stage visual grounding by addressing current limitations on grounding long and complex queries. Existing one-stage methods encode the entire language query as a single sentence embedding vector, e.g., taking the embedding from BERT or the hidden state from LSTM. This single vector representation is prone to overlooking the detailed descriptions in the query. To address this query modeling deficiency, we propose a recursive sub-query construction framework, which reasons between image and query for multiple rounds and reduces the referring ambiguity step by step. We show our new one-stage method obtains 5.0%, 4.5%, 7.5%, 12.8% absolute improvements over the state-of-the-art one-stage baseline on ReferItGame, RefCOCO, RefCOCO+, and RefCOCOg, respectively. In particular, superior performances on longer and more complex queries validates the effectiveness of our query modeling.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源