论文标题

多模式特征融合,并注意Vatex字幕挑战2020

Multi-modal Feature Fusion with Feature Attention for VATEX Captioning Challenge 2020

论文作者

Lin, Ke, Gan, Zhuoxin, Wang, Liwei

论文摘要

该报告描述了我们的Vatex字幕挑战2020的模型。首先,要从多个域中收集信息,我们提取运动,外观,语义和音频功能。然后,我们设计一个功能注意模块,以在解码时参与不同的功能。我们应用两种类型的解码器,自上而下和X-Lan,并将这些模型集成以获得最终结果。提出的方法的表现优于正式基线,其差距很大。我们在英语和中文私人测试套件上实现了76.0苹果酒和50.0苹果酒。我们在英语和中文私人测试排行榜上排名第二。

This report describes our model for VATEX Captioning Challenge 2020. First, to gather information from multiple domains, we extract motion, appearance, semantic and audio features. Then we design a feature attention module to attend on different feature when decoding. We apply two types of decoders, top-down and X-LAN and ensemble these models to get the final result. The proposed method outperforms official baseline with a significant gap. We achieve 76.0 CIDEr and 50.0 CIDEr on English and Chinese private test set. We rank 2nd on both English and Chinese private test leaderboard.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源