通过条件链映射进行混合信号的多序列学习顺序

论文标题

通过条件链映射进行混合信号的多序列学习顺序

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

论文作者

Shi, Jing, Chang, Xuankai, Guo, Pengcheng, Watanabe, Shinji, Fujita, Yusuke, Xu, Jiaming, Xu, Bo, Xie, Lei

论文摘要

神经序列到序列模型已很好地建立了可以将单个输入序列映射到单个输出序列中的应用。在这项工作中，我们专注于一到一对序列的转导问题，例如从混合序列中提取多个顺序源。我们将标准序列到序列模型扩展到条件多序列模型，该模型明确地模拟了多个输出序列与概率链规则之间的相关性。基于此扩展，我们的模型可以通过使用输入和先前估计的上下文输出序列来调节输出序列。该模型还具有用于转导末尾的简单有效的停止标准，使其能够推断出可变的输出序列数量。我们将语音数据作为主要测试字段来评估我们的方法，因为观察到的语音数据通常由多个来源组成，这是由于声波的叠加原理的性质。对几个不同任务（包括语音分离和多演讲者语音识别）的实验表明，我们的条件多序列模型可导致对常规非条件模型的一致改进。

Neural sequence-to-sequence models are well established for applications which can be cast as mapping a single input sequence into a single output sequence. In this work, we focus on one-to-many sequence transduction problems, such as extracting multiple sequential sources from a mixture sequence. We extend the standard sequence-to-sequence model to a conditional multi-sequence model, which explicitly models the relevance between multiple output sequences with the probabilistic chain rule. Based on this extension, our model can conditionally infer output sequences one-by-one by making use of both input and previously-estimated contextual output sequences. This model additionally has a simple and efficient stop criterion for the end of the transduction, making it able to infer the variable number of output sequences. We take speech data as a primary test field to evaluate our methods since the observed speech data is often composed of multiple sources due to the nature of the superposition principle of sound waves. Experiments on several different tasks including speech separation and multi-speaker speech recognition show that our conditional multi-sequence models lead to consistent improvements over the conventional non-conditional models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题