利用突变来评估神经网络对基因组数据的可解释性

论文标题

利用突变来评估神经网络对基因组数据的可解释性

Utilizing Mutations to Evaluate Interpretability of Neural Networks on Genomic Data

论文作者

Ozbulak, Utku, Kang, Solha, Zuallaert, Jasper, Depuydt, Stephen, Vankerschaver, Joris

论文摘要

即使深层神经网络（DNNS）为涉及基因组数据的许多问题实现了最新的结果，但由于其黑箱性质，吸引DNNS来解释其决策过程是一个重大挑战。获得DNN来解释其预测推理的一种方法是通过归因方法来突出显示对预测最大的输入部分。鉴于存在许多归因方法，并且缺乏对这些方法的保真度的定量结果，因此选择了基于序列任务的归因方法。在这项工作中，我们通过提出利用点突变的计算方法来迈向确定最忠实的归因方法。在七种流行归因方法上提供定量结果，我们发现layerwise相关性传播（LRP）是翻译起始的最合适的结果，LRP识别了两个重要的生物学特征：Kozak序列的完整性以及早产终止代码子的不利影响。

Even though deep neural networks (DNNs) achieve state-of-the-art results for a number of problems involving genomic data, getting DNNs to explain their decision-making process has been a major challenge due to their black-box nature. One way to get DNNs to explain their reasoning for prediction is via attribution methods which are assumed to highlight the parts of the input that contribute to the prediction the most. Given the existence of numerous attribution methods and a lack of quantitative results on the fidelity of those methods, selection of an attribution method for sequence-based tasks has been mostly done qualitatively. In this work, we take a step towards identifying the most faithful attribution method by proposing a computational approach that utilizes point mutations. Providing quantitative results on seven popular attribution methods, we find Layerwise Relevance Propagation (LRP) to be the most appropriate one for translation initiation, with LRP identifying two important biological features for translation: the integrity of Kozak sequence as well as the detrimental effects of premature stop codons.

下载PDF全文

下载文献需遵守相关版权规定

论文标题