规范皮质图神经网络及其在视听助听器中的语音增强应用

论文标题

规范皮质图神经网络及其在视听助听器中的语音增强应用

Canonical Cortical Graph Neural Networks and its Application for Speech Enhancement in Audio-Visual Hearing Aids

论文作者

Passos, Leandro A., Papa, João Paulo, Hussain, Amir, Adeel, Ahsan

论文摘要

尽管机器学习算法最近取得了成功，但大多数模型在考虑需要不同来源之间需要相互作用的更复杂的任务时会面临缺点，例如多模式输入数据和逻辑时间序列。另一方面，从这个意义上讲，生物学大脑被高度锐化，有权自动管理和整合此类信息流。在这种情况下，这项工作从脑皮质回路中的最新发现中汲取了灵感，提出了一种更具生物学上合理的自我监督的机器学习方法。这将使用层内调制的多模式信息与规范相关性分析结合在一起，以及一种记忆机制，以跟踪时间数据，总体方法称为规范皮质图神经网络。证明这可以优于最新的最新模型，从而在基准音频语音数据集的清洁音频重建和能源效率方面表现出色。通过降低和窒息的神经元发射速率分布来证明增强性能。表明该建议的模型可以在未来视听助听器设备中提高语音增强。

Despite the recent success of machine learning algorithms, most models face drawbacks when considering more complex tasks requiring interaction between different sources, such as multimodal input data and logical time sequences. On the other hand, the biological brain is highly sharpened in this sense, empowered to automatically manage and integrate such streams of information. In this context, this work draws inspiration from recent discoveries in brain cortical circuits to propose a more biologically plausible self-supervised machine learning approach. This combines multimodal information using intra-layer modulations together with Canonical Correlation Analysis, and a memory mechanism to keep track of temporal data, the overall approach termed Canonical Cortical Graph Neural networks. This is shown to outperform recent state-of-the-art models in terms of clean audio reconstruction and energy efficiency for a benchmark audio-visual speech dataset. The enhanced performance is demonstrated through a reduced and smother neuron firing rate distribution. suggesting that the proposed model is amenable for speech enhancement in future audio-visual hearing aid devices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题