论文标题
Astrocatr:大型天文目录的有效时间序列重建的机制和工具
AstroCatR: a Mechanism and Tool for Efficient Time Series Reconstruction of Large-Scale Astronomical Catalogues
论文作者
论文摘要
天体对象的时间序列数据通常用于研究有价值的和意外的物体,例如时域天文学中的极性行星和超新星。由于数据量的迅速增长,传统的手动方法变得极为坚硬,不断地分析累积的观察数据。为了满足此类需求,我们设计并实施了一个名为Astrocatr的特殊工具,该工具可以从大规模的天文目录中有效,灵活地重建时间序列数据。 Astrocatr可以从灵活的图像传输系统(拟合)文件或数据库中加载原始目录数据,匹配每个项目以确定其属于哪个对象,最后生成时间序列数据集。为了支持大规模数据集的高性能并行处理,Astrocatr使用Extract-Transform-Load(ETL)预处理模块来创建天空区域文件并平衡工作负载。匹配模块使用重叠的索引方法和内存参考表来提高准确性和性能。 Astrocatr的输出可以存储在CSV文件中,也可以根据需要将其转换为其他格式。同时,基于模块的软件体系结构确保了Astrocatr的灵活性和可扩展性。我们通过三个南极调查望远镜(AST3)的实际观察数据评估了星形摄影菌。实验表明,Astrocatr可以通过设置相关参数和配置文件来有效,灵活地重建所有时间序列数据。此外,该工具的速度比使用关系数据库管理系统在匹配大量目录的方法中快3倍。
Time series data of celestial objects are commonly used to study valuable and unexpected objects such as extrasolar planets and supernova in time domain astronomy. Due to the rapid growth of data volume, traditional manual methods are becoming extremely hard and infeasible for continuously analyzing accumulated observation data. To meet such demands, we designed and implemented a special tool named AstroCatR that can efficiently and flexibly reconstruct time series data from large-scale astronomical catalogues. AstroCatR can load original catalogue data from Flexible Image Transport System (FITS) files or databases, match each item to determine which object it belongs to, and finally produce time series datasets. To support the high-performance parallel processing of large-scale datasets, AstroCatR uses the extract-transform-load (ETL) preprocessing module to create sky zone files and balance the workload. The matching module uses the overlapped indexing method and an in-memory reference table to improve accuracy and performance. The output of AstroCatR can be stored in CSV files or be transformed other into formats as needed. Simultaneously, the module-based software architecture ensures the flexibility and scalability of AstroCatR. We evaluated AstroCatR with actual observation data from The three Antarctic Survey Telescopes (AST3). The experiments demonstrate that AstroCatR can efficiently and flexibly reconstruct all time series data by setting relevant parameters and configuration files. Furthermore, the tool is approximately 3X faster than methods using relational database management systems at matching massive catalogues.