论文标题

技术决定算法:读取的最新发展

Technology dictates algorithms: Recent developments in read alignment

论文作者

Alser, Mohammed, Rotman, Jeremy, Taraszka, Kodi, Shi, Huwenbo, Baykal, Pelin Icer, Yang, Harry Taegyun, Xue, Victor, Knyazev, Sergey, Singer, Benjamin D., Balliu, Brunilda, Koslicki, David, Skums, Pavel, Zelikovsky, Alex, Alkan, Can, Mutlu, Onur, Mangul, Serghei

论文摘要

大量平行的测序技术通过对人类,动物和微生物的基因组提供前所未有的见解,彻底改变了生物学和医学科学。现代测序平台以核苷酸序列或读取形式产生大量的基因组数据。对齐对参考基因组的读取可以鉴定单个特异性遗传变异,并且是大多数基因组分析管道中的重要步骤。对齐的读取对于回答重要的生物学问题至关重要,例如检测驱动各种人类疾病和复杂性状的突变以及识别元基因组样品中存在的物种。由于分析的数据集大量和测序平台的众多技术局限性,读取对齐问题极具挑战性,研究人员开发了新颖的生物信息学算法来应对这些困难。重要的是,计算算法已经根据技术进步发展和多样化,从而导致了各种各样的生物信息学工具。我们的综述提供了对1988年至2020年之间107种对齐方式的算法基础和方法的调查,均简要读取。我们对11个读取器进行了严格的实验评估,以证明这些基本算法对读取器的速度和效率的影响。我们分别讨论读取长度如何产生独特的优势和限制来读取对齐技术。我们还讨论了如何根据生物学的各个领域的特定需求来量身定制一般一致性算法,包括整个转录组,适应性免疫曲目和人类微生物组研究。

Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants and is an essential step of the majority of genomic analysis pipelines. Aligned reads are essential for answering important biological questions, such as detecting mutations driving various human diseases and complex traits as well as identifying species present in metagenomic samples. The read alignment problem is extremely challenging due to the large size of analyzed datasets and numerous technological limitations of sequencing platforms, and researchers have developed novel bioinformatics algorithms to tackle these difficulties. Importantly, computational algorithms have evolved and diversified in accordance with technological advances, leading to todays diverse array of bioinformatics tools. Our review provides a survey of algorithmic foundations and methodologies across 107 alignment methods published between 1988 and 2020, for both short and long reads. We provide rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read aligners. We separately discuss how longer read lengths produce unique advantages and limitations to read alignment techniques. We also discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology, including whole transcriptome, adaptive immune repertoire, and human microbiome studies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源