论文标题
漫画:用于(多)OMICS数据的可解释的端到端学习的卷积内核网络
COmic: Convolutional Kernel Networks for Interpretable End-to-End Learning on (Multi-)Omics Data
论文作者
论文摘要
动机:随着技术进步,可用的OMICS数据集的规模正在稳步增加。虽然样本量的增加可用于提高医疗保健中相关预测任务的性能,但针对大型数据集进行了优化的模型通常作为黑匣子运行。在高赌注方案中,例如医疗保健,使用黑盒模型会带来安全和安全问题。没有关于影响预测的分子因素和表型的解释,医疗保健提供者别无选择,只能盲目相信这些模型。我们提出了一种新型的人工神经网络,名为“卷积” OMICS内核网络(漫画)。通过将卷积内核网络与途径诱导的内核相结合,我们的方法可以在大小的尺寸上启用稳健且可解释的端到端学习,范围从数百到数十万个样本。此外,漫画很容易适应使用多摩变数据。 结果:我们评估了六种不同乳腺癌队列上漫画的性能能力。此外,我们使用代表队列训练了多摩斯数据的漫画模型。我们的模型在这两个任务上的表现更好或与竞争对手相似。我们展示了途径诱导的拉普拉斯内核的使用如何打开神经网络的黑盒性质,并导致可解释的模型,从而消除了对事后解释模型的需求。
Motivation: The size of available omics datasets is steadily increasing with technological advancement in recent years. While this increase in sample size can be used to improve the performance of relevant prediction tasks in healthcare, models that are optimized for large datasets usually operate as black boxes. In high stakes scenarios, like healthcare, using a black-box model poses safety and security issues. Without an explanation about molecular factors and phenotypes that affected the prediction, healthcare providers are left with no choice but to blindly trust the models. We propose a new type of artificial neural network, named Convolutional Omics Kernel Network (COmic). By combining convolutional kernel networks with pathway-induced kernels, our method enables robust and interpretable end-to-end learning on omics datasets ranging in size from a few hundred to several hundreds of thousands of samples. Furthermore, COmic can be easily adapted to utilize multi-omics data. Results: We evaluated the performance capabilities of COmic on six different breast cancer cohorts. Additionally, we trained COmic models on multi-omics data using the METABRIC cohort. Our models performed either better or similar to competitors on both tasks. We show how the use of pathway-induced Laplacian kernels opens the black-box nature of neural networks and results in intrinsically interpretable models that eliminate the need for post-hoc explanation models.