论文标题
Lofar射电星系交叉匹配技术的机器学习分类器
A machine learning classifier for LOFAR radio galaxy cross-matching techniques
论文作者
论文摘要
像Lofar这样的新代射电望远镜正在进行广泛的天空调查,发现数百万源。为了最大程度地提高这些调查的科学价值,无线电源组件必须正确地与物理源相关联,然后才与它们的光学/红外线对应物进行交叉匹配。在本文中,我们使用机器学习来识别所需的源关联或与光学/红外目录的统计交叉匹配的那些无线电来源是不可靠的。我们使用Lofar两米Sky Survey(Lots)的手动注释训练二进制分类器。我们发现,与仅基于无线电源参数的分类模型相比,最接近的邻居无线电源的特征,潜在的光主体星系以及无线电源组成,而不是高斯组件,都可以提高模型性能。我们的最佳模型是增强梯度分类器,在优化分类阈值后,在平衡数据集上的准确度为95%,整体(不平衡)样本的准确度为96%。毫不奇怪,分类器在小于15 ARCSEC的源中的小型无线电源上表现最佳,但仍达到99%的准确性,但在分辨率来源方面仍然可以达到70%的准确性。它标志着需要视觉检查所需的68%的来源,但这仍然比手动开发的决策树少,同时也具有较低的错误接受源用于统计分析的源。结果有一个即时的实用应用,用于交叉匹配接下来的数据发布,并且可以推广到其他无线电调查。
New-generation radio telescopes like LOFAR are conducting extensive sky surveys, detecting millions of sources. To maximise the scientific value of these surveys, radio source components must be properly associated into physical sources before being cross-matched with their optical/infrared counterparts. In this paper, we use machine learning to identify those radio sources for which either source association is required or statistical cross-matching to optical/infrared catalogues is unreliable. We train a binary classifier using manual annotations from the LOFAR Two-metre Sky Survey (LoTSS). We find that, compared to a classification model based on just the radio source parameters, the addition of features of the nearest-neighbour radio sources, the potential optical host galaxy, and the radio source composition in terms of Gaussian components, all improve model performance. Our best model, a gradient boosting classifier, achieves an accuracy of 95 per cent on a balanced dataset and 96 per cent on the whole (unbalanced) sample after optimising the classification threshold. Unsurprisingly, the classifier performs best on small, unresolved radio sources, reaching almost 99 per cent accuracy for sources smaller than 15 arcsec, but still achieves 70 per cent accuracy on resolved sources. It flags 68 per cent more sources than required as needing visual inspection, but this is still fewer than the manually-developed decision tree used in LoTSS, while also having a lower rate of wrongly accepted sources for statistical analysis. The results have an immediate practical application for cross-matching the next LoTSS data releases and can be generalised to other radio surveys.