论文标题
咒语:机器学习参考光曲线数据集用于天文瞬态事件识别
MANTRA: A Machine Learning reference lightcurve dataset for astronomical transient event recognition
论文作者
论文摘要
我们介绍了Mantra,这是一个带注释的数据集,该数据集由4869个瞬态和71207的非传播对象灯曲面,由Catalina实时瞬态调查构建。我们将公开访问该数据集作为纯文本文件,以促进天文瞬态事件识别算法的标准化定量比较。数据集中包含的一些类别是:超新星,灾难性变量,活跃的银河系核,高适当的运动星,大麻和耀斑。作为可以在数据集上执行的任务的一个示例,我们使用多种数据预处理方法,功能选择技术和流行的机器学习算法(支持向量机,随机森林和神经网络)实验。我们评估两个分类任务中的定量性能:二进制(瞬态/非传播)和八类分类。在这两个任务中,性能最好的算法是随机的森林分类器。在二进制分类中,它的F1得分为96.25%,在八级分类中达到了52.79%。对于八类分类,非变形剂(96.83%)是最高的F1分数的类,而最低的级别对应于高级动机恒星(16.79%);对于超新星,它的价值为54.57%,接近整个类别的平均值。下一个咒语的版本包括具有深度学习模型的图像和基准。
We introduce MANTRA, an annotated dataset of 4869 transient and 71207 non-transient object lightcurves built from the Catalina Real Time Transient Survey. We provide public access to this dataset as a plain text file to facilitate standardized quantitative comparison of astronomical transient event recognition algorithms. Some of the classes included in the dataset are: supernovae, cataclysmic variables, active galactic nuclei, high proper motion stars, blazars and flares. As an example of the tasks that can be performed on the dataset we experiment with multiple data pre-processing methods, feature selection techniques and popular machine learning algorithms (Support Vector Machines, Random Forests and Neural Networks). We assess quantitative performance in two classification tasks: binary (transient/non-transient) and eight-class classification. The best performing algorithm in both tasks is the Random Forest Classifier. It achieves an F1-score of 96.25% in the binary classification and 52.79% in the eight-class classification. For the eight-class classification, non-transients ( 96.83% ) is the class with the highest F1-score, while the lowest corresponds to high-proper-motion stars ( 16.79% ); for supernovae it achieves a value of 54.57% , close to the average across classes. The next release of MANTRA includes images and benchmarks with deep learning models.