论文标题

用于从SDSS的星系形态分类的机器学习技术。 iii。基于图像的详细特征推断

Machine learning technique for morphological classification of galaxies from the SDSS. III. Image-based inference of detailed features

论文作者

Khramtsov, V., Vavilova, I. B., Dobrycheva, D. V., Vasylenko, M. Yu., Melnyk, O. V., Elyiv, A. A., Akhmetov, V. S., Dmytrenko, A. M.

论文摘要

本文遵循我们关于各种机器学习方法适用于形态星系分类的一系列作品(Vavilova等,2021,2022)。我们利用了315776 SDSS DR9星系的样品,其绝对恒星幅度为-24m <mr <-194m,在0.003 <z <0.1 <z <0.1作为基于Densenet-201的CNN分类器的目标数据集。由于它与Galaxy Zoo 2(GZ2)样本紧密地重叠,因此我们使用这些注释数据作为训练数据集将星系分类为34个详细特征。在GZ2训练数据集与没有已知形态学参数的星系中的星系之间存在明显的视觉参数差异的情况下,我们采用了新的程序,这使我们首次摆脱了这种差异,以换取较小且愚蠢的SDSS星系。 我们详细描述了对抗性验证技术,以及如何管理培训数据集中星系的最佳火车测试拆分。我们还发现了最佳的星系图像转换,以提高分类器的概括能力。可以将其视为改善人类偏见的另一种方式,因为那些在GZ项目中投票分类不佳的星系图像。当接受非常好的图像训练的CNN分类器能够从相同的均匀样本中训练不良图像时,这种方法可能被认为是共平面的,可以将不良图像重新训练。 CNN分类器的准确性在83.3-99.4%的范围内取决于32个功能。结果,我们第一次为超过140k的低红移星系分配了详细的形态分类,尤其是在flain端。我们从天文学的角度强调了星系CNN图像分类的典型问题点。目录将通过Vizier提供。

This paper follows series of our works on the applicability of various machine learning methods to the morphological galaxy classification (Vavilova et al., 2021, 2022). We exploited the sample of 315776 SDSS DR9 galaxies with absolute stellar magnitudes of -24m<Mr<-19.4m at 0.003<z<0.1 as a target data set for the CNN classifier based on the DenseNet-201. Because it is tightly overlapped with the Galaxy Zoo 2 (GZ2) sample, we use these annotated data as the training data set to classify galaxies into 34 detailed features. In the presence of a pronounced difference of visual parameters between galaxies from the GZ2 training data set and galaxies without known morphological parameters, we applied novel procedures, which allowed us for the first time to get rid of this difference for smaller and fainter SDSS galaxies. We describe in detail the adversarial validation technique as well as how we managed the optimal train-test split of galaxies from the training data set. We have also found optimal galaxy image transformations to increase the classifier generalization ability. It can be considered as another way to improve the human bias for those galaxy images that had a poor vote classification in the GZ project. Such an approach, likely auto-immunization, when the CNN classifier trained on very good images is able to retrain bad images from the same homogeneous sample, can be considered co-planar to other methods of combating the human bias. The accuracy of CNN classifier is in the range of 83.3-99.4 percent depending on 32 features. As a result, for the first time, we assigned the detailed morphological classification for more than 140K low-redshift galaxies, especially at the fainter end. We accentuate on the typical problem points of galaxy CNN image classification from the astronomical point of view. The catalogs will be available through the VizieR.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源