论文标题

Ienhancer-Elm:通过根据增强器语言模型提取与位置相关的多尺度上下文信息来改善增强子识别

iEnhancer-ELM: improve enhancer identification by extracting position-related multiscale contextual information based on enhancer language models

论文作者

Li, Jiahao, Wu, Zhourun, Lin, Wenhao, Luo, Jiawei, Zhang, Jun, Chen, Qingcai, Chen, Junjie

论文摘要

动机:增强子是重要的顺式调节元素,可调节广泛的生物学功能并增强靶基因的转录。尽管已经提出了许多特征提取方法来提高增强子识别的性能,但他们无法从原始DNA序列中学习与位置相关的多尺度上下文信息。 结果:在本文中,我们提出了一种基于类似Bert的增强剂语言模型的新型增强剂识别方法(Ienhancer-Elm)。 Ienhancer-Elm用多尺度K-MERS将DNA序列引起了DNA序列,并通过多头注意力机制提取了与其位置相关的不同规模K-MER的上下文信息。我们首先评估不同规模的K-Mers的性能,然后将它们整合以提高增强子识别的性能。两个流行的基准数据集的实验结果表明,我们的模型的表现优于现行方法。我们进一步说明了Ienhancer-Elm的解释性。对于一个案例研究,我们通过基于3-Mer的模型发现了30个增强子图案,其中12个主题由Streme和Jaspar验证,证明我们的模型具有推出增强剂生物学机制的潜在能力。 可用性和实施​​:可在https://github.com/chen-bioinfo/ienhancer-elm上获得模型和关联的代码 联系人:[email protected] 补充信息:可以在线生物信息学进步获得补充数据。

Motivation: Enhancers are important cis-regulatory elements that regulate a wide range of biological functions and enhance the transcription of target genes. Although many feature extraction methods have been proposed to improve the performance of enhancer identification, they cannot learn position-related multiscale contextual information from raw DNA sequences. Results: In this article, we propose a novel enhancer identification method (iEnhancer-ELM) based on BERT-like enhancer language models. iEnhancer-ELM tokenizes DNA sequences with multi-scale k-mers and extracts contextual information of different scale k-mers related with their positions via an multi-head attention mechanism. We first evaluate the performance of different scale k-mers, then ensemble them to improve the performance of enhancer identification. The experimental results on two popular benchmark datasets show that our model outperforms stateof-the-art methods. We further illustrate the interpretability of iEnhancer-ELM. For a case study, we discover 30 enhancer motifs via a 3-mer-based model, where 12 of motifs are verified by STREME and JASPAR, demonstrating our model has a potential ability to unveil the biological mechanism of enhancer. Availability and implementation: The models and associated code are available at https://github.com/chen-bioinfo/iEnhancer-ELM Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics Advances online.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源