层次剥削以检测分层多标签分类的丢失注释

论文标题

层次剥削以检测分层多标签分类的丢失注释

Hierarchy exploitation to detect missing annotations on hierarchical multi-label classification

论文作者

Romero, Miguel, Nakano, Felipe Kenji, Finke, Jorge, Rocha, Camilo, Vens, Celine

论文摘要

基因组数据的可用性在过去十年中呈指数增长，这主要是由于新测序技术的发展。基于从增加的基因组数据中提取的基因（和基因产物）之间的相互作用，许多研究集中在基因与功能之间的关联鉴定上。尽管这些研究表现出了很大的希望，但用功能注释基因的问题仍然是一个悬而未决的挑战。在这项工作中，我们提出了一种检测分层多标签分类数据集中缺失注释的方法。我们提出了一种通过将汇总概率计算到每个实例的从叶子到根的类路径来利用类层次结构的方法。提出的方法是在预测缺失基因函数注释的背景下提出的，其中这些汇总概率被进一步用于选择一组通过体内实验验证的注释。各种大米的Oriza Sativa Japonica的实验，将类别的层次结合到该方法中通常会提高预测性能，而与文献的竞争者方法相比，我们所提出的方法可以提高结果。

The availability of genomic data has grown exponentially in the last decade, mainly due to the development of new sequencing technologies. Based on the interactions between genes (and gene products) extracted from the increasing genomic data, numerous studies have focused on the identification of associations between genes and functions. While these studies have shown great promise, the problem of annotating genes with functions remains an open challenge. In this work, we present a method to detect missing annotations in hierarchical multi-label classification datasets. We propose a method that exploits the class hierarchy by computing aggregated probabilities to the paths of classes from the leaves to the root for each instance. The proposed method is presented in the context of predicting missing gene function annotations, where these aggregated probabilities are further used to select a set of annotations to be verified through in vivo experiments. The experiments on Oriza sativa Japonica, a variety of rice, showcase that incorporating the hierarchy of classes into the method often improves the predictive performance and our proposed method yields superior results when compared to competitor methods from the literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题