通过语音和文本对阿尔茨海默氏病的多模式检测

论文标题

通过语音和文本对阿尔茨海默氏病的多模式检测

Multi-Modal Detection of Alzheimer's Disease from Speech and Text

论文作者

Mittal, Amish, Sahoo, Sourav, Datar, Arnhav, Kadiwala, Juned, Shalu, Hrithwik, Mathew, Jimson

论文摘要

即使在今天，与其他神经认知障碍不同，对阿尔茨海默氏病（AD）的前驱阶段的可靠检测仍然很困难。在这种情况下，现有的研究表明，即使在轻度的AD条件下，患者也经常发展语言障碍。我们提出了一种多模式深度学习方法，该方法利用语音和相应的成绩单同时检测AD。对于音频信号，基于卷积神经网络（CNN）模型的基于音频的网络预测了多个语音段的诊断，这些段是最终预测的组合。同样，我们使用从与CNN生成的嵌入的BERT提取的上下文嵌入来对转录本进行分类。然后将两个模型的个体预测组合在一起以进行最终分类。当使用基于文本的模型中，使用自动化语音识别（ASR）系统生成的成绩单而不是手动转录时，我们还执行实验来分析模型性能。在培训和评估dementiabank Pitt Corpus时，提出的方法可实现85.3％的10倍交叉验证精度。

Reliable detection of the prodromal stages of Alzheimer's disease (AD) remains difficult even today because, unlike other neurocognitive impairments, there is no definitive diagnosis of AD in vivo. In this context, existing research has shown that patients often develop language impairment even in mild AD conditions. We propose a multimodal deep learning method that utilizes speech and the corresponding transcript simultaneously to detect AD. For audio signals, the proposed audio-based network, a convolutional neural network (CNN) based model, predicts the diagnosis for multiple speech segments, which are combined for the final prediction. Similarly, we use contextual embedding extracted from BERT concatenated with a CNN-generated embedding for classifying the transcript. The individual predictions of the two models are then combined to make the final classification. We also perform experiments to analyze the model performance when Automated Speech Recognition (ASR) system generated transcripts are used instead of manual transcription in the text-based model. The proposed method achieves 85.3% 10-fold cross-validation accuracy when trained and evaluated on the Dementiabank Pitt corpus.

下载PDF全文

下载文献需遵守相关版权规定

论文标题