使用枢轴稳定化的网络规模语言覆盖范围的跨模式语言生成

论文标题

使用枢轴稳定化的网络规模语言覆盖范围的跨模式语言生成

Cross-modal Language Generation using Pivot Stabilization for Web-scale Language Coverage

论文作者

Thapliyal, Ashish V., Soricut, Radu

论文摘要

诸如图像字幕之类的跨模式的语言生成任务因其支持非英语语言的能力而受到影响，而这些渴望的模型与缺乏非英语注释相结合。我们研究了将英语中现有语言生成注释与翻译功能相结合的潜在解决方案，以便在域和语言覆盖范围内的Web级创建解决方案。我们描述了一种称为枢轴语言产生稳定（插头）的方法，该方法直接在训练时间利用现有的英语注释（金数据）及其机器翻译版本（银数据）；在运行时，它首先生成英文标题，然后产生相应的目标语言标题。我们表明，使用“开放图像数据集中的图像”，在大型域测试集下，插件模型在5种不同的目标语言上进行的评估中的其他候选解决方案的表现优于其他候选解决方案。此外，我们发现了一个有趣的效果，即插头模型生成的英语字幕比原始单语英文模型生成的字幕更好。

Cross-modal language generation tasks such as image captioning are directly hurt in their ability to support non-English languages by the trend of data-hungry models combined with the lack of non-English annotations. We investigate potential solutions for combining existing language-generation annotations in English with translation capabilities in order to create solutions at web-scale in both domain and language coverage. We describe an approach called Pivot-Language Generation Stabilization (PLuGS), which leverages directly at training time both existing English annotations (gold data) as well as their machine-translated versions (silver data); at run-time, it generates first an English caption and then a corresponding target-language caption. We show that PLuGS models outperform other candidate solutions in evaluations performed over 5 different target languages, under a large-domain testset using images from the Open Images dataset. Furthermore, we find an interesting effect where the English captions generated by the PLuGS models are better than the captions generated by the original, monolingual English model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题