How2Sign：连续美国手语的大规模多模式数据集

论文标题

How2Sign：连续美国手语的大规模多模式数据集

How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language

论文作者

Duarte, Amanda, Palaskar, Shruti, Ventura, Lucas, Ghadiyaram, Deepti, DeHaan, Kenneth, Metze, Florian, Torres, Jordi, Giro-i-Nieto, Xavier

论文摘要

阻碍手语识别，翻译和生产领域进展的因素之一是缺乏大量注释的数据集。为此，我们介绍了2sign，一种多模式和多视图连续的美国手语（ASL）数据集，由超过80小时的手语视频和一组相应的模式，包括语音，英语成绩单和深度组成。在全景工作室中进一步记录了三个小时的子集，从而实现了详细的3D姿势估计。为了评估2符号对现实世界影响的潜力，我们对ASL签名者进行了一项研究，并表明确实可以理解使用我们数据集的合成视频。这项研究进一步洞悉了计算机视觉应解决的挑战，以便在这一领域取得进展。数据集网站：http：//how2sign.github.io/

One of the factors that have hindered progress in the areas of sign language recognition, translation, and production is the absence of large annotated datasets. Towards this end, we introduce How2Sign, a multimodal and multiview continuous American Sign Language (ASL) dataset, consisting of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth. A three-hour subset was further recorded in the Panoptic studio enabling detailed 3D pose estimation. To evaluate the potential of How2Sign for real-world impact, we conduct a study with ASL signers and show that synthesized videos using our dataset can indeed be understood. The study further gives insights on challenges that computer vision should address in order to make progress in this field. Dataset website: http://how2sign.github.io/

下载PDF全文

下载文献需遵守相关版权规定

论文标题