谈话发行的节奏

论文标题

谈话发行的节奏

Talking-head Generation with Rhythmic Head Motion

论文作者

Chen, Lele, Cui, Guofeng, Liu, Celong, Li, Zhong, Kou, Ziyi, Xu, Yi, Xu, Chenliang

论文摘要

当人们发表演讲时，他们自然会移动头部，而这种有节奏的头运动传达了韵律信息。但是，在自然移动头部时产生唇部同步的视频是具有挑战性的。尽管非常成功，但现有作品要么生成仍在说话的视频，要么依靠地标/视频帧作为稀疏/密集的映射指南来产生头部移动，从而导致不现实或无法控制的视频综合。为了克服局限性，我们提出了一个3D感知的生成网络以及混合嵌入模块和非线性组成模块。通过对头部运动和面部表情进行建模1，明确地操纵3D动画，并动态地嵌入参考图像，我们的方法具有具有自然头部运动的可控，真实和时间连贯的说话头视频。对几种标准基准的周到实验表明，我们的方法在定量和定性比较中都比最先进的方法取得了明显的更好结果。该代码可在https://github.com/ lelechen63/talking-head-with-with-rhearthmic-head-motion上找到。

When people deliver a speech, they naturally move heads, and this rhythmic head motion conveys prosodic information. However, generating a lip-synced video while moving head naturally is challenging. While remarkably successful, existing works either generate still talkingface videos or rely on landmark/video frames as sparse/dense mapping guidance to generate head movements, which leads to unrealistic or uncontrollable video synthesis. To overcome the limitations, we propose a 3D-aware generative network along with a hybrid embedding module and a non-linear composition module. Through modeling the head motion and facial expressions1 explicitly, manipulating 3D animation carefully, and embedding reference images dynamically, our approach achieves controllable, photo-realistic, and temporally coherent talking-head videos with natural head movements. Thoughtful experiments on several standard benchmarks demonstrate that our method achieves significantly better results than the state-of-the-art methods in both quantitative and qualitative comparisons. The code is available on https://github.com/ lelechen63/Talking-head-Generation-with-Rhythmic-Head-Motion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题