从在线语音任务中检测帕金森氏病

论文标题

从在线语音任务中检测帕金森氏病

Detecting Parkinson's Disease From an Online Speech-task

论文作者

Rahman, Wasifur, Lee, Sangwu, Islam, Md. Saiful, Antony, Victor Nikhil, Ratnu, Harshil, Ali, Mohammad Rafayet, Mamun, Abdullah Al, Wagner, Ellen, Jensen-Roberts, Stella, Little, Max A., Dorsey, Ray, Hoque, Ehsan

论文摘要

在本文中，我们设想了一个基于网络的框架，该框架可以帮助世界各地的任何人都记录一项简短的语音任务，并分析记录的数据以筛选帕金森氏病（PD）。我们从726名独特参与者（262个PD，女性38％； 464个非PD，65％的女性；平均年龄：61）收集了数据 - 来自美国各地及以后。在实验室设置中收集了一小部分数据，以比较质量。指示参与者说出一个流行的pangram，其中包含英语字母中的所有字母“快速棕狐跳过了懒狗”。我们从语音数据中提取了标准的声学特征（MEL频率Cepstral系数（MFCC），抖动和微光变体）和基于深度学习的特征。使用这些功能，我们培训了几种机器学习算法。我们通过通过XGBoost建模标准声学特征来确定自我报告的帕金森氏病的存在，实现了0.75 AUC（曲线下的区域）性能，这是一种增强梯度的决策树模型。进一步的分析表明，广泛使用的MFCC功能和先前验证的吞咽困难特征的子集设计用于从口头发音任务（发音“ ahh'）中检测帕金森氏症的特征。我们的模型在受控的实验室环境以及不同性别和年龄段的“野外”中收集的数据以及“在野外”中的数据同样出色。使用此工具，我们几乎可以从任何启用视频/音频设备的任何地方收集数据，从而有助于神经护理中的公平和访问。

In this paper, we envision a web-based framework that can help anyone, anywhere around the world record a short speech task, and analyze the recorded data to screen for Parkinson's disease (PD). We collected data from 726 unique participants (262 PD, 38% female; 464 non-PD, 65% female; average age: 61) -- from all over the US and beyond. A small portion of the data was collected in a lab setting to compare quality. The participants were instructed to utter a popular pangram containing all the letters in the English alphabet "the quick brown fox jumps over the lazy dog..". We extracted both standard acoustic features (Mel Frequency Cepstral Coefficients (MFCC), jitter and shimmer variants) and deep learning based features from the speech data. Using these features, we trained several machine learning algorithms. We achieved 0.75 AUC (Area Under The Curve) performance on determining presence of self-reported Parkinson's disease by modeling the standard acoustic features through the XGBoost -- a gradient-boosted decision tree model. Further analysis reveal that the widely used MFCC features and a subset of previously validated dysphonia features designed for detecting Parkinson's from verbal phonation task (pronouncing 'ahh') contains the most distinct information. Our model performed equally well on data collected in controlled lab environment as well as 'in the wild' across different gender and age groups. Using this tool, we can collect data from almost anyone anywhere with a video/audio enabled device, contributing to equity and access in neurological care.

下载PDF全文

下载文献需遵守相关版权规定

论文标题