LSTM-RNN中的内存分析用于源分离

论文标题

LSTM-RNN中的内存分析用于源分离

Analysis of memory in LSTM-RNNs for source separation

论文作者

Zegers, Jeroen, Van hamme, Hugo

论文摘要

在许多语音处理任务中，长期的短期记忆复发网络（LSTM-RNN）被认为是最先进的。原则上，网络中的复发允许记住任何输入的时间，这是对诸如语音之类的顺序数据非常有用的功能。但是，对于实际上存储在LSTM中以及持续了多长时间的信息知之甚少。我们通过使用内存重置方法来解决此问题，该方法使我们可以根据允许的内存时间跨度评估网络性能。我们将此方法应用于多演讲者源分离的任务，但可以用于使用RNN的任何任务。我们发现短期（短于100毫秒）语言过程的强烈绩效效果。只有说话者的特征在记忆中保存超过400毫秒。此外，我们确认在性能方面足以在更深层次的层中实现更长的内存。最后，在双向模型中，向后模型对分离性能的贡献比向前模型多一些。

Long short-term memory recurrent neural networks (LSTM-RNNs) are considered state-of-the art in many speech processing tasks. The recurrence in the network, in principle, allows any input to be remembered for an indefinite time, a feature very useful for sequential data like speech. However, very little is known about which information is actually stored in the LSTM and for how long. We address this problem by using a memory reset approach which allows us to evaluate network performance depending on the allowed memory time span. We apply this approach to the task of multi-speaker source separation, but it can be used for any task using RNNs. We find a strong performance effect of short-term (shorter than 100 milliseconds) linguistic processes. Only speaker characteristics are kept in the memory for longer than 400 milliseconds. Furthermore, we confirm that performance-wise it is sufficient to implement longer memory in deeper layers. Finally, in a bidirectional model, the backward models contributes slightly more to the separation performance than the forward model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题