深入学习生物信息学的结构化预测

论文标题

深入学习生物信息学的结构化预测

Towards Structured Prediction in Bioinformatics with Deep Learning

论文作者

Li, Yu

论文摘要

使用机器学习，尤其是深度学习，促进生物学研究是一个引人入胜的研究方向。但是，除了标准分类或回归问题外，在生物信息学中，我们通常还需要预测更复杂的结构化靶标，例如2D图像和3D分子结构。上述复杂的预测任务称为结构化预测。考虑到大多数原始的生物信息学问题具有复杂的输出对象，结构化预测比传统的分类更为复杂，但具有更广泛的应用。由于这些结构化预测问题的属性，例如在标签空间内具有特定问题的约束和依赖性，因此现有深度学习模型的直接应用可能会导致结果不令人满意。在这里，我们认为以下思想可以帮助解决生物信息学中的结构化预测问题。首先，我们可以将深度学习与其他经典算法（例如概率图形模型）相结合，这些算法模型明确地对问题结构进行了建模。其次，我们可以通过明确或隐式考虑结构化的标签空间和问题约束来设计特定问题的深度学习架构或方法。我们通过四个生物信息学子场的六个项目展示了我们的想法，包括测序分析，结构预测，功能注释和网络分析。结构化输出涵盖1D信号，2D图像，3D结构，层次标记和异质网络。借助上述想法，我们所有的方法都可以在相应的问题上实现SOTA性能。这些项目的成功促使我们将工作扩展到其他更具挑战性但重要的问题，例如医疗保健问题，这些问题可以直接使人们的健康和健康受益。

Using machine learning, especially deep learning, to facilitate biological research is a fascinating research direction. However, in addition to the standard classification or regression problems, in bioinformatics, we often need to predict more complex structured targets, such as 2D images and 3D molecular structures. The above complex prediction tasks are referred to as structured prediction. Structured prediction is more complicated than the traditional classification but has much broader applications, considering that most of the original bioinformatics problems have complex output objects. Due to the properties of those structured prediction problems, such as having problem-specific constraints and dependency within the labeling space, the straightforward application of existing deep learning models can lead to unsatisfactory results. Here, we argue that the following ideas can help resolve structured prediction problems in bioinformatics. Firstly, we can combine deep learning with other classic algorithms, such as probabilistic graphical models, which model the problem structure explicitly. Secondly, we can design the problem-specific deep learning architectures or methods by considering the structured labeling space and problem constraints, either explicitly or implicitly. We demonstrate our ideas with six projects from four bioinformatics subfields, including sequencing analysis, structure prediction, function annotation, and network analysis. The structured outputs cover 1D signals, 2D images, 3D structures, hierarchical labeling, and heterogeneous networks. With the help of the above ideas, all of our methods can achieve SOTA performance on the corresponding problems. The success of these projects motivates us to extend our work towards other more challenging but important problems, such as health-care problems, which can directly benefit people's health and wellness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题