论文标题
设计和评估简单的数据接口,用于跨不同存储的有效数据传输
Design and Evaluation of a Simple Data Interface for Efficient Data Transfer Across Diverse Storage
论文作者
论文摘要
现代科学和工程计算环境通常具有不同类型的存储系统,从高性能计算中心的并行文件系统到云提供商经营的对象存储。为了在这些不同的系统之间启用简单,可靠,安全和性能的数据交换,我们提出了连接器,这是一种可插入的数据访问体系结构,用于分布式存储。通过抽象低级存储系统的详细信息,此抽象允许托管数据传输服务(在我们的情况下)与大型易于扩展的存储系统进行交互。同样重要的是,它支持第三方转移:也就是说,由第三方客户端启动但不会在数据路径中引发该第三方的直接数据传输。该抽象还可以管理转移的管理,以进行性能优化,错误处理和端到端完整性。我们介绍连接器设计,描述不同存储服务的实现,评估托管与直接转移中固有的权衡,激励推荐的部署选项,并提出一种基于绩效模型的方法,该方法允许在不详尽基准测试的情况下轻松地在不同上下文中表征性能。
Modern science and engineering computing environments often feature storage systems of different types, from parallel file systems in high-performance computing centers to object stores operated by cloud providers. To enable easy, reliable, secure, and performant data exchange among these different systems, we propose Connector, a pluggable data access architecture for diverse, distributed storage. By abstracting low-level storage system details, this abstraction permits a managed data transfer service (Globus in our case) to interact with a large and easily extended set of storage systems. Equally important, it supports third-party transfers: that is, direct data transfers from source to destination that are initiated by a third-party client but do not engage that third party in the data path. The abstraction also enables management of transfers for performance optimization, error handling, and end-to-end integrity. We present the Connector design, describe implementations for different storage services, evaluate tradeoffs inherent in managed vs.\ direct transfers, motivate recommended deployment options, and propose a performance model-based method that allows for easy characterization of performance in different contexts without exhaustive benchmarking.