论文标题
在千度调查数据4中的类星体的光度法选择和红移4
Photometric selection and redshifts for quasars in the Kilo-Degree Survey Data Release 4
论文作者
论文摘要
我们在Kilo-Asuble调查(儿童)数据版本4中介绍了类星体和相应的红移目录。我们在Sloan Digital Sky Survey(SDSS)Spectroscopecy上使用光学UGRI和近红外Zyjhk_s频段培训了机器学习(ML)模型。我们从儿童光度数据的4500万个对象中定义了推理子集,限于9波段检测。我们表明,可以成功地使用高维特征空间的投影来研究估计。该模型创建使用两个测试子集:随机选择和最微弱的对象,可以适应偏差与差异权衡。我们测试了三个ML模型:随机森林(RF),XGBoost(XGB)和人工神经网络(ANN)。我们发现XGB是分类最强大的模型,而ANN则表现出最佳的组合分类和红移。使用数字计数,GAIA视差和其他类星体目录对推理结果进行了测试。基于这些测试,我们得出了提供最佳纯度与完整性权衡的最小分类概率:p(qso_cand)> 0.9对于r <22和p(qso_cand)> 0.98> 0.98,对于22 <r <23.5。我们在安全的推理子集中发现158,000名候选者(r <22),在可靠的外推体制中又有185,000名候选人(22 <r <23.5)。测试数据纯度等于97%,完整性为94%;后者在外推到数据稀疏的外推比训练集的幅度下降了3%。用高斯不确定性对光度红移进行建模。红移误差(平均值和散射)等于安全子集中的0.01 +/- 0.1,在外推的-0.0004 +/- 0.2等于0.14 <z <3.63的红移范围。我们对外推的成功挑战了模型在微弱的数据端进行优化和应用的方式。该目录已准备好进行宇宙学和主动银河核(AGN)研究。
We present a catalog of quasars and corresponding redshifts in the Kilo-Degree Survey (KiDS) Data Release 4. We trained machine learning (ML) models, using optical ugri and near-infrared ZYJHK_s bands, on objects known from Sloan Digital Sky Survey (SDSS) spectroscopy. We define inference subsets from the 45 million objects of the KiDS photometric data limited to 9-band detections. We show that projections of the high-dimensional feature space can be successfully used to investigate the estimations. The model creation employs two test subsets: randomly selected and the faintest objects, which allows to fit the bias versus variance trade-off. We tested three ML models: random forest (RF), XGBoost (XGB), and artificial neural network (ANN). We find that XGB is the most robust model for classification, while ANN performs the best for combined classification and redshift. The inference results are tested using number counts, Gaia parallaxes, and other quasar catalogs. Based on these tests, we derived the minimum classification probability which provides the best purity versus completeness trade-off: p(QSO_cand) > 0.9 for r < 22 and p(QSO_cand) > 0.98 for 22 < r < 23.5. We find 158,000 quasar candidates in the safe inference subset (r < 22) and an additional 185,000 candidates in the reliable extrapolation regime (22 < r < 23.5). Test-data purity equals 97% and completeness is 94%; the latter drops by 3% in the extrapolation to data fainter by one magnitude than the training set. The photometric redshifts were modeled with Gaussian uncertainties. The redshift error (mean and scatter) equals 0.01 +/- 0.1 in the safe subset and -0.0004 +/- 0.2 in the extrapolation, in a redshift range of 0.14 < z < 3.63. Our success of the extrapolation challenges the way that models are optimized and applied at the faint data end. The catalog is ready for cosmology and active galactic nucleus (AGN) studies.