用于自动唇读系统和转移学习的多模式德国数据集

论文标题

用于自动唇读系统和转移学习的多模式德国数据集

A Multimodal German Dataset for Automatic Lip Reading Systems and Transfer Learning

论文作者

Schwiebert, Gerald, Weber, Cornelius, Qu, Leyuan, Siqueira, Henrique, Wermter, Stefan

论文摘要

深度学习唇部阅读所需的大数据集并不以多种语言存在。在本文中，我们介绍了数据集Glips（德语嘴唇），其中包括25万个公开可用的视频，其中包括Hessian Parliamement的扬声器的面孔，该视频是使用自动管道进行处理的，用于单词级唇部阅读。该格式类似于英语LRW（野生中的唇读）数据集的格式，每个视频在1.16秒的持续时间内编码一个感兴趣的单词，这持续时间为1.16秒，这可以在两个数据集之间学习传输学习的兼容性。通过训练深层神经网络，我们研究唇读是否具有与语言无关的特征，以便可以使用不同语言的数据集来改善唇部阅读模型。我们展示了从头开始学习的学习，并证明将学习从LRW转移到GLIPS，反之亦然，可以提高学习速度和表现，特别是对于验证集。

Large datasets as required for deep learning of lip reading do not exist in many languages. In this paper we present the dataset GLips (German Lips) consisting of 250,000 publicly available videos of the faces of speakers of the Hessian Parliament, which was processed for word-level lip reading using an automatic pipeline. The format is similar to that of the English language LRW (Lip Reading in the Wild) dataset, with each video encoding one word of interest in a context of 1.16 seconds duration, which yields compatibility for studying transfer learning between both datasets. By training a deep neural network, we investigate whether lip reading has language-independent features, so that datasets of different languages can be used to improve lip reading models. We demonstrate learning from scratch and show that transfer learning from LRW to GLips and vice versa improves learning speed and performance, in particular for the validation set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题