论文标题
贝叶斯采样偏置校正:具有正确损失功能的训练
Bayesian Sampling Bias Correction: Training with the Right Loss Function
论文作者
论文摘要
在存在采样偏见的情况下,我们得出了损失功能家族来训练模型。例如,病理学的患病率与培训数据集中的采样率不同,或者是机器学习实践者重新平衡其培训数据集的时候。采样偏差会在实验室和更现实的设置中的模型性能之间引起巨大的差异。它在医学成像应用中无处不在,但在培训时间经常被忽略或临时解决。我们的方法基于贝叶斯风险最小化。对于任意可能的可能性模型,我们得出了相关的偏差校正训练损失,并直接与信息增益联系在一起。该方法无缝地集成在使用随机反向传播的(深)学习的当前范式中,并且与贝叶斯模型自然地集成在一起。我们说明了关于肺结节恶性肿瘤分级的案例研究的方法。
We derive a family of loss functions to train models in the presence of sampling bias. Examples are when the prevalence of a pathology differs from its sampling rate in the training dataset, or when a machine learning practioner rebalances their training dataset. Sampling bias causes large discrepancies between model performance in the lab and in more realistic settings. It is omnipresent in medical imaging applications, yet is often overlooked at training time or addressed on an ad-hoc basis. Our approach is based on Bayesian risk minimization. For arbitrary likelihood models we derive the associated bias corrected loss for training, exhibiting a direct connection to information gain. The approach integrates seamlessly in the current paradigm of (deep) learning using stochastic backpropagation and naturally with Bayesian models. We illustrate the methodology on case studies of lung nodule malignancy grading.