论文标题
大流行中的分布式计算:可用于解决COVID-19的技术的评论
Distributed Computing in a Pandemic: A Review of Technologies Available for Tackling COVID-19
论文作者
论文摘要
当前由SARS-COV-2 Betacoronavirus引起的Covid-19全球大流行导致了超过100万人死亡,并且正在产生严重的社会经济影响,因此有迫切需要找到解决关键研究挑战的解决方案。这项COVID-19的大部分研究都取决于分布式计算。在本文中,我审查了分布式体系结构 - 各种类型的簇,网格和云 - 可以利用这些架构,以高通量和高度的并行性在大规模执行这些任务,并且也可以用于协作。高性能计算(HPC)簇将用于执行大部分工作。用于减少SARS-COV-2传播的几种BigData处理任务需要高通量方法,以及多种工具,即Hadoop和Spark提供,甚至使用商品硬件。非常大的Covid-19研究还利用了一些世界上最快的超级计算机,例如IBM的峰会 - 用于对SARS-COV-2对SARS-COV-2的高通量筛查进行合奏,以进行药物替代性和高通量基因分析 - 以及XPE-Cray的基于XPE-Cray的基于XPE-Cray的系统。网格计算有助于世界上第一个Exascale网格计算机的形成。这已经加速了COVID-19在SARS-COV-2 SPIKE蛋白相互作用的分子动力学模拟中通过大规模并行计算进行了研究,并使用折叠@Home平台使用超过100万个志愿者计算设备进行了进行。网格和云都可以通过启用重要数据集访问并提供可让研究人员专注于研究而不是耗时的数据管理任务来提供网格和云进行国际协作。
The current COVID-19 global pandemic caused by the SARS-CoV-2 betacoronavirus has resulted in over a million deaths and is having a grave socio-economic impact, hence there is an urgency to find solutions to key research challenges. Much of this COVID-19 research depends on distributed computing. In this article, I review distributed architectures -- various types of clusters, grids and clouds -- that can be leveraged to perform these tasks at scale, at high-throughput, with a high degree of parallelism, and which can also be used to work collaboratively. High-performance computing (HPC) clusters will be used to carry out much of this work. Several bigdata processing tasks used in reducing the spread of SARS-CoV-2 require high-throughput approaches, and a variety of tools, which Hadoop and Spark offer, even using commodity hardware. Extremely large-scale COVID-19 research has also utilised some of the world's fastest supercomputers, such as IBM's SUMMIT -- for ensemble docking high-throughput screening against SARS-CoV-2 targets for drug-repurposing, and high-throughput gene analysis -- and Sentinel, an XPE-Cray based system used to explore natural products. Grid computing has facilitated the formation of the world's first Exascale grid computer. This has accelerated COVID-19 research in molecular dynamics simulations of SARS-CoV-2 spike protein interactions through massively-parallel computation and was performed with over 1 million volunteer computing devices using the Folding@home platform. Grids and clouds both can also be used for international collaboration by enabling access to important datasets and providing services that allow researchers to focus on research rather than on time-consuming data-management tasks.