论文标题
比较自然语音中的自然语言处理技术与阿尔茨海默氏症的痴呆预测
Comparing Natural Language Processing Techniques for Alzheimer's Dementia Prediction in Spontaneous Speech
论文作者
论文摘要
阿尔茨海默氏症的痴呆症(AD)是影响认知功能的无法治愈,令人衰弱和进行性神经退行性疾病。早期诊断很重要,因为治疗剂会延迟进展并给予诊断为重要的时间。开发分析自发语音的模型最终可以提供有效的诊断方式,以便于早期诊断AD。阿尔茨海默氏症通过自发语音任务的痴呆症识别提供了通过自发语音的建模,提供了具有听觉预处理和平衡的数据集,用于分类和预测AD和相关表型。我们专门分析了自发语音数据集的提供的文本笔录,在众多模型中构建和比较AD VS Controls分类的性能以及对心理迷你状态考试分数的预测。我们严格训练和评估支持向量机(SVM),梯度增强决策树(GBDT)和有条件的随机字段(CRF)以及基于深度学习变压器的模型。我们发现,我们的顶级性能模型是一个简单的频率分数文档频率(TF-IDF)矢量器作为SVM模型的输入,并且将基于预训练的变压器的模型“ Distilbert”用作简单线性模型中的嵌入层时。我们证明了在分类指标的测试集分数为0.81-0.82,RMSE为4.58。
Alzheimer's Dementia (AD) is an incurable, debilitating, and progressive neurodegenerative condition that affects cognitive function. Early diagnosis is important as therapeutics can delay progression and give those diagnosed vital time. Developing models that analyse spontaneous speech could eventually provide an efficient diagnostic modality for earlier diagnosis of AD. The Alzheimer's Dementia Recognition through Spontaneous Speech task offers acoustically pre-processed and balanced datasets for the classification and prediction of AD and associated phenotypes through the modelling of spontaneous speech. We exclusively analyse the supplied textual transcripts of the spontaneous speech dataset, building and comparing performance across numerous models for the classification of AD vs controls and the prediction of Mental Mini State Exam scores. We rigorously train and evaluate Support Vector Machines (SVMs), Gradient Boosting Decision Trees (GBDT), and Conditional Random Fields (CRFs) alongside deep learning Transformer based models. We find our top performing models to be a simple Term Frequency-Inverse Document Frequency (TF-IDF) vectoriser as input into a SVM model and a pre-trained Transformer based model `DistilBERT' when used as an embedding layer into simple linear models. We demonstrate test set scores of 0.81-0.82 across classification metrics and a RMSE of 4.58.