论文标题
一种基于机器学习的假设驱动的方法,用于神经成像数据分析
A hypothesis-driven method based on machine learning for neuroimaging data analysis
论文作者
论文摘要
关于机器学习的有用性和解释(MLE)方法是歧视样品或激活状态之间大脑图像的空间模式的问题。在过去的几十年中,这些方法限制了其操作,以采用提取和线性分类任务,以进行组间推断。在这种情况下,通过随机置换图像标签或使用考虑受试者间可变性的随机效应模型来评估统计推断。这些基于MLE的多元MLE统计管道可能比以假设为驱动的方法更有效地检测激活,但失去了其数学优雅,易于解释和无处不在的通用线性模型(GLM)的空间定位。最近,当设计矩阵以二进制指标矩阵表示时,已证明常规GLM的估计已连接到单变量分类任务。在本文中,我们探讨了单变量GLM和mle \ emph {Recressions}之间的完整联系。为此,我们根据\ emph {inverse}问题(svr-iglm)中的线性支持向量回归(SVR)获得的参数得出了GLM的精制统计测试。随后,使用随机场理论(RFT)来评估常规GLM基准后的统计显着性。实验结果证明了从每个模型(主要是GLM和SVR)得出的参数估计如何产生与预定义功能任务显着相关的不同实验设计估计值。此外,使用来自多站点倡议的实际数据,提出的基于MLE的推论证明了统计能力和对误报的控制,表现优于常规GLM。
There remains an open question about the usefulness and the interpretation of Machine learning (MLE) approaches for discrimination of spatial patterns of brain images between samples or activation states. In the last few decades, these approaches have limited their operation to feature extraction and linear classification tasks for between-group inference. In this context, statistical inference is assessed by randomly permuting image labels or by the use of random effect models that consider between-subject variability. These multivariate MLE-based statistical pipelines, whilst potentially more effective for detecting activations than hypotheses-driven methods, have lost their mathematical elegance, ease of interpretation, and spatial localization of the ubiquitous General linear Model (GLM). Recently, the estimation of the conventional GLM has been demonstrated to be connected to an univariate classification task when the design matrix is expressed as a binary indicator matrix. In this paper we explore the complete connection between the univariate GLM and MLE \emph{regressions}. To this purpose we derive a refined statistical test with the GLM based on the parameters obtained by a linear Support Vector Regression (SVR) in the \emph{inverse} problem (SVR-iGLM). Subsequently, random field theory (RFT) is employed for assessing statistical significance following a conventional GLM benchmark. Experimental results demonstrate how parameter estimations derived from each model (mainly GLM and SVR) result in different experimental design estimates that are significantly related to the predefined functional task. Moreover, using real data from a multisite initiative the proposed MLE-based inference demonstrates statistical power and the control of false positives, outperforming the regular GLM.