论文标题
用于在医学诊断中结合异质数据源的贝叶斯分层网络
A Bayesian Hierarchical Network for Combining Heterogeneous Data Sources in Medical Diagnoses
论文作者
论文摘要
计算机辅助诊断表明在多种测试方式(医学图像,电生理信号等)提供准确的医学诊断方面表现出出色的表现。尽管该领域通常集中于完全收集单个(通常非常可靠)的方式提供的信号,但更少的努力利用了缺乏可靠地面真实标签的不精确数据。在这种无监督的,嘈杂的环境中,诊断不确定性的鲁棒化和量化变得至关重要,从而提出了一个新的挑战:我们如何结合多个信息来源(通常其本身具有巨大不同的精度和不确定性水平),以提供诊断估计的限制?由抗体测试中的具体应用激励,我们设计了一种随机期望最大化算法,该算法允许将异质和潜在不可靠的数据类型的原理集成。我们的贝叶斯形式主义对于(a)灵活地结合了这些异质数据源及其相应的不确定性水平,(b)量化与给定诊断相关的置信度,以及(c)处理通常粘贴医疗数据的缺失值。我们在模拟数据上量化了这种方法的潜力,并通过将其部署在真正的Covid-19免疫研究中来展示其实用性。
Computer-Aided Diagnosis has shown stellar performance in providing accurate medical diagnoses across multiple testing modalities (medical images, electrophysiological signals, etc.). While this field has typically focused on fully harvesting the signal provided by a single (and generally extremely reliable) modality, fewer efforts have utilized imprecise data lacking reliable ground truth labels. In this unsupervised, noisy setting, the robustification and quantification of the diagnosis uncertainty become paramount, thus posing a new challenge: how can we combine multiple sources of information -- often themselves with vastly varying levels of precision and uncertainty -- to provide a diagnosis estimate with confidence bounds? Motivated by a concrete application in antibody testing, we devise a Stochastic Expectation-Maximization algorithm that allows the principled integration of heterogeneous, and potentially unreliable, data types. Our Bayesian formalism is essential in (a) flexibly combining these heterogeneous data sources and their corresponding levels of uncertainty, (b) quantifying the degree of confidence associated with a given diagnostic, and (c) dealing with the missing values that typically plague medical data. We quantify the potential of this approach on simulated data, and showcase its practicality by deploying it on a real COVID-19 immunity study.