通过扭曲动画：高质量面部表达动画的有效方法

论文标题

通过扭曲动画：高质量面部表达动画的有效方法

Animating Through Warping: an Efficient Method for High-Quality Facial Expression Animation

论文作者

Yi, Zili, Tang, Qiang, Srinivasan, Vishnu Sanjay Ramiya, Xu, Zhan

论文摘要

深度神经网络的进步已大大改善了在没有3D域操作的情况下对静止图像进行动画动画的艺术。鉴于，由于记忆限制，训练难度和缺乏高分辨率（HD）培训数据集，先前的艺术只能使小图像动画（通常不超过512x512），从而大大降低了它们在电影制作和交互式系统中应用的潜力。通过将高频残差添加到由神经网络产生的低分辨率结果来生成的想法的动机，我们提出了一个新颖的框架，称为通过扭曲（ATW）进行动画（ATW）来启用HD图像的有效动画。具体而言，所提出的框架由两个模块组成，一个新颖的两阶段神经网络生成器和一个新颖的后加工模块，称为通过翘曲（ATW）进行动画化的模块。它只需要对发电机进行小图像训练，并且可以对任何大小的图像进行推断。在推断过程中，将HD输入图像分解为低分辨率分量（128x128）及其相应的高频残差。发电机可以预测低分辨率结果以及将输入面扭曲所需状态的运动场（例如表达式类别或操作单位）。最后，ReswARP模块根据运动场来扭曲残差并添加扭曲的残差以生成最终的HD结果，从而由天真采样的低分辨率结果产生最终的HD结果。实验显示了我们方法在产生高分辨率动画方面的有效性和效率。我们提出的框架成功地使4K面部图像动画了，这从未通过先前的神经模型来实现。此外，我们的方法通常可以保证生成的动画的时间相干性。源代码将公开可用。

Advances in deep neural networks have considerably improved the art of animating a still image without operating in 3D domain. Whereas, prior arts can only animate small images (typically no larger than 512x512) due to memory limitations, difficulty of training and lack of high-resolution (HD) training datasets, which significantly reduce their potential for applications in movie production and interactive systems. Motivated by the idea that HD images can be generated by adding high-frequency residuals to low-resolution results produced by a neural network, we propose a novel framework known as Animating Through Warping (ATW) to enable efficient animation of HD images. Specifically, the proposed framework consists of two modules, a novel two-stage neural-network generator and a novel post-processing module known as Animating Through Warping (ATW). It only requires the generator to be trained on small images and can do inference on an image of any size. During inference, an HD input image is decomposed into a low-resolution component(128x128) and its corresponding high-frequency residuals. The generator predicts the low-resolution result as well as the motion field that warps the input face to the desired status (e.g., expressions categories or action units). Finally, the ResWarp module warps the residuals based on the motion field and adding the warped residuals to generates the final HD results from the naively up-sampled low-resolution results. Experiments show the effectiveness and efficiency of our method in generating high-resolution animations. Our proposed framework successfully animates a 4K facial image, which has never been achieved by prior neural models. In addition, our method generally guarantee the temporal coherency of the generated animations. Source codes will be made publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题