论文标题
具有半纵卷变压器的数据独立获取质谱数据的峰值检测
Peak Detection On Data Independent Acquisition Mass Spectrometry Data With Semisupervised Convolutional Transformers
论文作者
论文摘要
基于质谱(LC-MS)方法的液相色谱法通常用于高通量,定量测量蛋白质组(即在给定时间样品中所有蛋白质的集合)。靶向的LC-MS以二维时间序列频谱的形式产生数据,一个轴上分析物(M/Z)的质量与电荷比,另一个轴的保留时间。感兴趣肽的洗脱会在多个片段离子痕迹(提取的离子色谱图或XICS)上产生高度特异性的模式。在本文中,我们将此峰检测问题提出为多元时间序列分割问题,并根据变压器结构提出了一种新颖的方法。在这里,我们可以通过具有卷积神经网络(CNN)来捕获具有全球视图的长距离依赖性的变压器,它们可以以具有卷积自我注意力的变压器的形式捕获对手头任务很重要的局部环境。我们通过适应多通道时间序列数据的最先进的图像分类技术来进一步以半监视的方式训练该模型。使用手动注释对代表性LC-MS数据集进行了实验,以展示我们方法的令人鼓舞的性能;它的表现优于基线神经网络体系结构,并且在自动化峰检测中与当前最新技术竞争。
Liquid Chromatography coupled to Mass Spectrometry (LC-MS) based methods are commonly used for high-throughput, quantitative measurements of the proteome (i.e. the set of all proteins in a sample at a given time). Targeted LC-MS produces data in the form of a two-dimensional time series spectrum, with the mass to charge ratio of analytes (m/z) on one axis, and the retention time from the chromatography on the other. The elution of a peptide of interest produces highly specific patterns across multiple fragment ion traces (extracted ion chromatograms, or XICs). In this paper, we formulate this peak detection problem as a multivariate time series segmentation problem, and propose a novel approach based on the Transformer architecture. Here we augment Transformers, which are capable of capturing long distance dependencies with a global view, with Convolutional Neural Networks (CNNs), which can capture local context important to the task at hand, in the form of Transformers with Convolutional Self-Attention. We further train this model in a semisupervised manner by adapting state of the art semisupervised image classification techniques for multi-channel time series data. Experiments on a representative LC-MS dataset are benchmarked using manual annotations to showcase the encouraging performance of our method; it outperforms baseline neural network architectures and is competitive against the current state of the art in automated peak detection.