论文标题

具有STMC转换器的更好的手语翻译

Better Sign Language Translation with STMC-Transformer

论文作者

Yin, Kayo, Read, Jesse

论文摘要

手语翻译(SLT)首先使用手语识别(SLR)系统来从视频中提取手语颜色。然后,翻译系统从手语彩色上产生口头语言翻译。本文重点介绍了翻译系统,并介绍了STMC转换器,该系统将当前的最新时间改进了5和7 BLEU,分别在Gloss-to-Text和Videx-to-toxx Transpation上的凤凰城2014T 2014T数据集上进行了改进。在ASLG-PC12语料库上,我们报告了超过16个BLEU的增加。 我们还证明了依赖于光泽监督的当前方法中的问题。我们的STMC转换器的视频对文本翻译优于GT Glosses的翻译。这与先前的说法相矛盾,即GT Gloss翻译是SLT性能的上限,并揭示了光泽是手语的效率低下。因此,对于将来的SLT研究,我们建议对识别和翻译模型进行端到端培训,或使用不同的手语注释方案。

Sign Language Translation (SLT) first uses a Sign Language Recognition (SLR) system to extract sign language glosses from videos. Then, a translation system generates spoken language translations from the sign language glosses. This paper focuses on the translation system and introduces the STMC-Transformer which improves on the current state-of-the-art by over 5 and 7 BLEU respectively on gloss-to-text and video-to-text translation of the PHOENIX-Weather 2014T dataset. On the ASLG-PC12 corpus, we report an increase of over 16 BLEU. We also demonstrate the problem in current methods that rely on gloss supervision. The video-to-text translation of our STMC-Transformer outperforms translation of GT glosses. This contradicts previous claims that GT gloss translation acts as an upper bound for SLT performance and reveals that glosses are an inefficient representation of sign language. For future SLT research, we therefore suggest an end-to-end training of the recognition and translation models, or using a different sign language annotation scheme.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源