论文标题
进化就是您所需要的:对比度学习的系统发育增强
Evolution Is All You Need: Phylogenetic Augmentation for Contrastive Learning
论文作者
论文摘要
自我监督的表示生物序列嵌入的学习减轻了下游任务上的计算资源限制,同时规避了昂贵的实验标签获取。但是,现有的方法主要是直接从为NLP设计的大型语言模型中借用,而不是考虑生物信息学哲学。最近,对比性相互信息最大化方法已实现了成像网的最新表示。从这个角度来看,我们将讨论如何将进化视为自然序列增强和最大化系统发育“嘈杂通道”的信息,这是一个生物学和理论上的预期编码器的目标。我们首先提供了当前的对比学习文献的综述,然后提供了一个说明性的例子,在其中我们表明,使用进化增强的对比度学习可以用作表示生物学序列及其保守功能之间的相互信息的表示,并最终概述了这种方法的理由。
Self-supervised representation learning of biological sequence embeddings alleviates computational resource constraints on downstream tasks while circumventing expensive experimental label acquisition. However, existing methods mostly borrow directly from large language models designed for NLP, rather than with bioinformatics philosophies in mind. Recently, contrastive mutual information maximization methods have achieved state-of-the-art representations for ImageNet. In this perspective piece, we discuss how viewing evolution as natural sequence augmentation and maximizing information across phylogenetic "noisy channels" is a biologically and theoretically desirable objective for pretraining encoders. We first provide a review of current contrastive learning literature, then provide an illustrative example where we show that contrastive learning using evolutionary augmentation can be used as a representation learning objective which maximizes the mutual information between biological sequences and their conserved function, and finally outline rationale for this approach.