屏幕外自动配音的韵律对准

论文标题

屏幕外自动配音的韵律对准

Prosodic Alignment for off-screen automatic dubbing

论文作者

Virkar, Yogesh, Federico, Marcello, Enyedi, Robert, Barra-Chicote, Roberto

论文摘要

自动配音的目的是在实现视听连贯性的同时进行语音到语音翻译。这需要等级，即通过将其韵律结构匹配到短语和暂停中，尤其是在可见扬声器的嘴时，将原始演讲转换为原始演讲。在以前的工作中，我们引入了一个韵律对准模型，以解决等距或屏幕配音。在这项工作中，我们将韵律对准模型扩展到还解决需要较少严格同步约束的屏幕外配音。我们在四个配音方向上进行实验 - 英语至法语，意大利语，德语和西班牙语 - 在公开可用的TED会谈和公开可用的YouTube视频中进行实验。经验结果表明，与我们以前的工作相比，扩展的韵律对齐模型在视频上提供了更好的主观观看体验，在屏幕上和屏幕外自动配音中分别适用于扬声器嘴巴可见且不可见的句子。

The goal of automatic dubbing is to perform speech-to-speech translation while achieving audiovisual coherence. This entails isochrony, i.e., translating the original speech by also matching its prosodic structure into phrases and pauses, especially when the speaker's mouth is visible. In previous work, we introduced a prosodic alignment model to address isochrone or on-screen dubbing. In this work, we extend the prosodic alignment model to also address off-screen dubbing that requires less stringent synchronization constraints. We conduct experiments on four dubbing directions - English to French, Italian, German and Spanish - on a publicly available collection of TED Talks and on publicly available YouTube videos. Empirical results show that compared to our previous work the extended prosodic alignment model provides significantly better subjective viewing experience on videos in which on-screen and off-screen automatic dubbing is applied for sentences with speakers mouth visible and not visible, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题