基于口头演讲的想象语音的神经解码

论文标题

基于口头演讲的想象语音的神经解码

Towards Neural Decoding of Imagined Speech based on Spoken Speech

论文作者

Lee, Seo-Hyun, Lee, Young-Eun, Kim, Soowon, Ko, Byung-Kwan, Lee, Seong-Whan

论文摘要

从人脑信号中解码想象的语音是一个具有挑战性且重要的问题，可以通过大脑信号进行人类交流。尽管想象中的语音可能是通过大脑信号进行无声交流的范式，但总是很难收集足够的稳定数据来训练解码模型。同时，口语语音数据相对容易，并且要获得，这意味着使用口头语音脑信号来解码想象的语音的意义。在本文中，我们进行了初步分析，以了解是否可以通过简单地应用经过口头语音脑信号训练的预训练模型来解码想象的语音，是否可以利用口头语音脑电图数据来解码想象的语音。虽然仅用于训练和验证的想象的语音数据的分类性能为30.5％，但基于语音语音的分类器的转移性能向想象的语音数据显示的平均准确性为26.8％，与想象的基于语音的分类器相比，这在统计上没有显着差异（p = 0.0983，Chi-square，Chi-square = 4.64）。为了进行更全面的分析，我们将结果与视觉图像数据集进行了比较，与想象的语音相比，这自然与口语语音相关。结果，视觉图像显示仅训练有31.8％的训练性能，并转移了26.3％的性能，这表明彼此之间具有统计学上的显着差异（P = 0.022，Chi-square = 7.64）。我们的结果意味着将口语语音应用于解码想象中的语音及其基本共同特征的潜力。

Decoding imagined speech from human brain signals is a challenging and important issue that may enable human communication via brain signals. While imagined speech can be the paradigm for silent communication via brain signals, it is always hard to collect enough stable data to train the decoding model. Meanwhile, spoken speech data is relatively easy and to obtain, implying the significance of utilizing spoken speech brain signals to decode imagined speech. In this paper, we performed a preliminary analysis to find out whether if it would be possible to utilize spoken speech electroencephalography data to decode imagined speech, by simply applying the pre-trained model trained with spoken speech brain signals to decode imagined speech. While the classification performance of imagined speech data solely used to train and validation was 30.5 %, the transferred performance of spoken speech based classifier to imagined speech data displayed average accuracy of 26.8 % which did not have statistically significant difference compared to the imagined speech based classifier (p = 0.0983, chi-square = 4.64). For more comprehensive analysis, we compared the result with the visual imagery dataset, which would naturally be less related to spoken speech compared to the imagined speech. As a result, visual imagery have shown solely trained performance of 31.8 % and transferred performance of 26.3 % which had shown statistically significant difference between each other (p = 0.022, chi-square = 7.64). Our results imply the potential of applying spoken speech to decode imagined speech, as well as their underlying common features.

下载PDF全文

下载文献需遵守相关版权规定

论文标题