论文标题
不要在传统的机器学习上睡觉:简单且可解释的技术在睡眠评分方面具有竞争力
Do Not Sleep on Traditional Machine Learning: Simple and Interpretable Techniques Are Competitive to Deep Learning for Sleep Scoring
论文作者
论文摘要
在过去的几年中,自动睡眠评分的研究主要集中在开发日益复杂的深度学习体系结构上。但是,最近,这些方法仅实现了边际改进,通常是为了需要更多的数据和更昂贵的培训程序。尽管所有这些努力及其令人满意的表现,但在临床背景下,自动睡眠期陈旧解决方案并未被广泛采用。我们认为,大多数对睡眠评分的深度学习解决方案在现实世界中的适用性受到限制,因为它们很难训练,部署和繁殖。此外,这些解决方案缺乏可解释性和透明度,这通常是提高采用率的关键。在这项工作中,我们使用经典的机器学习来重新审视睡眠阶段分类的问题。结果表明,通过传统的机器学习管道可以实现竞争性能,该管道包括预处理,功能提取和简单的机器学习模型。特别是,我们分析了线性模型和非线性(梯度提升)模型的性能。我们的方法超过了两个公共数据集上的最先进(使用相同的数据):Sleep--EDF SC-20(MF1 0.810)和Sleep-edf ST(MF1 0.795),同时在Sleep-EDF SC-78(MF1 0.775)(MF1 0.775)和Mass SS3(MF1 0.817)上取得了竞争成果。我们表明,对于睡眠阶段评分任务,工程特征向量的表现力与深度学习模型的内部学说相同。该观察结果为临床采用打开了大门,因为代表性功能向量允许利用传统机器学习模型的可解释性和成功记录。
Over the last few years, research in automatic sleep scoring has mainly focused on developing increasingly complex deep learning architectures. However, recently these approaches achieved only marginal improvements, often at the expense of requiring more data and more expensive training procedures. Despite all these efforts and their satisfactory performance, automatic sleep staging solutions are not widely adopted in a clinical context yet. We argue that most deep learning solutions for sleep scoring are limited in their real-world applicability as they are hard to train, deploy, and reproduce. Moreover, these solutions lack interpretability and transparency, which are often key to increase adoption rates. In this work, we revisit the problem of sleep stage classification using classical machine learning. Results show that competitive performance can be achieved with a conventional machine learning pipeline consisting of preprocessing, feature extraction, and a simple machine learning model. In particular, we analyze the performance of a linear model and a non-linear (gradient boosting) model. Our approach surpasses state-of-the-art (that uses the same data) on two public datasets: Sleep-EDF SC-20 (MF1 0.810) and Sleep-EDF ST (MF1 0.795), while achieving competitive results on Sleep-EDF SC-78 (MF1 0.775) and MASS SS3 (MF1 0.817). We show that, for the sleep stage scoring task, the expressiveness of an engineered feature vector is on par with the internally learned representations of deep learning models. This observation opens the door to clinical adoption, as a representative feature vector allows to leverage both the interpretability and successful track record of traditional machine learning models.