mp3压缩以减少端到端语音识别中的对抗噪声

论文标题

mp3压缩以减少端到端语音识别中的对抗噪声

MP3 Compression To Diminish Adversarial Noise in End-to-End Speech Recognition

论文作者

Andronic, Iustina, Kürzinger, Ludwig, Rosas, Edgar Ricardo Chavez, Rigoll, Gerhard, Seeber, Bernhard U.

论文摘要

音频对抗示例（AAE）代表了旨在欺骗自动语音识别（ASR）系统进行错误分类的专门创建的输入。目前的工作提出了MP3压缩，作为减少ASR系统转录的音频样本中对抗噪声（AN）的影响的手段。为此，我们使用快速梯度符号方法生成了AAE，用于端到端混合CTC注意的ASR系统。然后，我们的方法通过两个客观指标进行验证：（1）角色错误率（CER），该字符错误率（CER）测量了在未压缩的四个ASR模型以及MP3压缩数据集的培训的语音解码性能以及（2）对未压缩和MP3被压缩的AAE估计的信号 - 噪声比率（SNR），这些AAE在功能上通过功能进行了重新构建。我们发现，与未压缩的AAE相比，应用于AAE的MP3压缩确实会减少CER。此外，在MP3压缩后，功能向内转化（重建）AAE的SNR明显更高，表明A AAS减少了。与AN的MP3压缩相反，适用于常规噪声增强的话语导致了更多的转录误差，从而进一步证明了MP3编码仅在减小AN的情况下有效。

Audio Adversarial Examples (AAE) represent specially created inputs meant to trick Automatic Speech Recognition (ASR) systems into misclassification. The present work proposes MP3 compression as a means to decrease the impact of Adversarial Noise (AN) in audio samples transcribed by ASR systems. To this end, we generated AAEs with the Fast Gradient Sign Method for an end-to-end, hybrid CTC-attention ASR system. Our method is then validated by two objective indicators: (1) Character Error Rates (CER) that measure the speech decoding performance of four ASR models trained on uncompressed, as well as MP3-compressed data sets and (2) Signal-to-Noise Ratio (SNR) estimated for both uncompressed and MP3-compressed AAEs that are reconstructed in the time domain by feature inversion. We found that MP3 compression applied to AAEs indeed reduces the CER when compared to uncompressed AAEs. Moreover, feature-inverted (reconstructed) AAEs had significantly higher SNRs after MP3 compression, indicating that AN was reduced. In contrast to AN, MP3 compression applied to utterances augmented with regular noise resulted in more transcription errors, giving further evidence that MP3 encoding is effective in diminishing only AN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题