论文标题

未见数据的模型概括性的预测:在T1加权对比增强3D MRI中检测的方法和案例研究

Prediction of Model Generalizability for Unseen Data: Methodology and Case Study in Brain Metastases Detection in T1-Weighted Contrast-Enhanced 3D MRI

论文作者

Dikici, Engin, Nguyen, Xuan, Takacs, Noah, Prevedello, Luciano M.

论文摘要

医学AI系统的推广性描述了其从不同地理,历史和方法论设置中获得的性能的连续性。以前关于该主题的文献主要集中在“如何”实现高概括性并有限成功的“如何”上。取而代之的是,我们旨在了解“何时”实现“何时”:我们的研究提出了一个医学AI系统,可以估算其在未来的数据中的普遍性状态。我们利用特雷切特距离损失引入了潜在的空间映射(LSM)方法,以迫使基础训练数据分布进入多元正态分布。在部署期间,处理给定的测试数据的LSM分布以检测其与强制分布的偏差;因此,AI系统可以预测其以前看不见的数据集的概括性状态。如果检测到低模型的通用性,则用警告消息告知用户。虽然该方法适用于大多数分类深神经网络,但我们证明了其在T1加权对比增强(T1C)3D MRI中的脑转移(BM)检测器的应用。 BM检测模型是使用内部获得的175个T1C研究培训的,并使用(1)42在内部进行了测试,(2)72从斯坦福大学医学院提供的公开分布的脑大都会数据集中获得了外部获得的考试。为测试数据集计算了BM检测器的概括性得分,假阳性(FP)率和灵敏度。该模型预测,其对测试数据的31%的概括性为低,在该数据中,该模型分别在76.1%的BM检测敏感性下(1)〜13.5 fps,分别在76.1%的BM检测敏感性下,分别在89.2%BM检测敏感性的高概括性组的(2)〜10.5 fps。结果表明,提出的公式使模型可以预测其对看不见数据的概括性。

A medical AI system's generalizability describes the continuity of its performance acquired from varying geographic, historical, and methodologic settings. Previous literature on this topic has mostly focused on "how" to achieve high generalizability with limited success. Instead, we aim to understand "when" the generalizability is achieved: Our study presents a medical AI system that could estimate its generalizability status for unseen data on-the-fly. We introduce a latent space mapping (LSM) approach utilizing Frechet distance loss to force the underlying training data distribution into a multivariate normal distribution. During the deployment, a given test data's LSM distribution is processed to detect its deviation from the forced distribution; hence, the AI system could predict its generalizability status for any previously unseen data set. If low model generalizability is detected, then the user is informed by a warning message. While the approach is applicable for most classification deep neural networks, we demonstrate its application to a brain metastases (BM) detector for T1-weighted contrast-enhanced (T1c) 3D MRI. The BM detection model was trained using 175 T1c studies acquired internally, and tested using (1) 42 internally and (2) 72 externally acquired exams from the publicly distributed Brain Mets dataset provided by the Stanford University School of Medicine. Generalizability scores, false positive (FP) rates, and sensitivities of the BM detector were computed for the test datasets. The model predicted its generalizability to be low for 31% of the testing data, where it produced (1) ~13.5 FPs at 76.1% BM detection sensitivity for the low and (2) ~10.5 FPs at 89.2% BM detection sensitivity for the high generalizability groups respectively. The results suggest that the proposed formulation enables a model to predict its generalizability for unseen data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源