论文标题
噬菌体家庭分类:使用最新的ICTV分类框架对当前工具的审查
Phage family classification under Caudoviricetes: a review of current tools using the latest ICTV classification framework
论文作者
论文摘要
噬菌体是感染细菌的病毒,是生物圈中普遍存在,最多样化的实体。有积极的证据揭示了它们在塑造各种微生物组结构中的重要作用。多亏了(病毒)核测序,已经发现了大量新的噬菌体。但是,由于缺乏标准和自动病毒分类管道,新病毒的分类表征严重落后于测序工作。特别是,根据最新版本的ICTV,删除了以前的分类系统中的几个大噬菌体家族。因此,需要在新标准下对分类学分类工具进行全面审查和比较,以建立最先进的方法。在这项工作中,我们对新标记的数据库进行了重新训练并测试了四个最近发布的工具。我们演示了它们的实用程序,并在多个数据集上进行了测试,包括RefSeq,短关键,模拟的元基因组数据集和低相似性数据集。这项研究在不同情况下对噬菌体家庭分类进行了全面综述,并提供了选择适当的分类分类管道的实用指南。据我们所知,这是根据新的ICTV分类框架进行的第一次审查。结果表明,新的家庭分类框架总体上会导致较高的群体,从而使家庭级别的分类更加可行。
Bacteriophages, which are viruses infecting bacteria, are the most ubiquitous and diverse entities in the biosphere. There is accumulating evidence revealing their important roles in shaping the structure of various microbiomes. Thanks to (viral) metagenomic sequencing, a large number of new bacteriophages have been discovered. However, lacking a standard and automatic virus classification pipeline, the taxonomic characterization of new viruses seriously lag behind the sequencing efforts. In particular, according to the latest version of ICTV, several large phage families in the previous classification system are removed. Therefore, a comprehensive review and comparison of taxonomic classification tools under the new standard are needed to establish the state-of-the-art. In this work, we retrained and tested four recently published tools on newly labeled databases. We demonstrated their utilities and tested them on multiple datasets, including the RefSeq, short contigs, simulated metagenomic datasets, and low-similarity datasets. This study provides a comprehensive review of phage family classification in different scenarios and a practical guidance for choosing appropriate taxonomic classification pipelines. To our best knowledge, this is the first review conducted under the new ICTV classification framework. The results show that the new family classification framework overall leads to better-conserved groups and thus makes family-level classification more feasible.