去除头部动作对视听语音增强的影响

论文标题

去除头部动作对视听语音增强的影响

The impact of removing head movements on audio-visual speech enhancement

论文作者

Kang, Zhiqi, Sadeghi, Mostafa, Horaud, Radu, Alameda-Pineda, Xavier, Donley, Jacob, Kumar, Anurag

论文摘要

本文研究了头部运动对视听语音增强（AVSE）的影响。尽管过去和最近的研究都忽略了头部运动，但他们却忽略了当今基于学习的方法，因为它们经常降低经过干净，额叶和稳定的面部图像训练的模型的性能。为了减轻这个问题，我们建议将强大的面部额叶（RFF）与基于各种自动编码器（VAE）模型的AVSE方法结合使用。我们简要描述了所提出的管道的基本成分，并使用最近发布的视听数据集执行实验。鉴于这些实验，并基于三个标准指标，即stoi，pesq和si-sdr，我们得出结论，rff可以提高AVSE的性能。

This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE). Although being a common conversational feature, head movements have been ignored by past and recent studies: they challenge today's learning-based methods as they often degrade the performance of models that are trained on clean, frontal, and steady face images. To alleviate this problem, we propose to use robust face frontalization (RFF) in combination with an AVSE method based on a variational auto-encoder (VAE) model. We briefly describe the basic ingredients of the proposed pipeline and we perform experiments with a recently released audio-visual dataset. In the light of these experiments, and based on three standard metrics, namely STOI, PESQ and SI-SDR, we conclude that RFF improves the performance of AVSE by a considerable margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题