使用可解释的AI技术检测出乳外超紧凑型矮人和球状簇

论文标题

使用可解释的AI技术检测出乳外超紧凑型矮人和球状簇

Detection of extragalactic Ultra-Compact Dwarfs and Globular Clusters using Explainable AI techniques

论文作者

Mohammadi, Mohammad, Mutatiina, Jarvin, Saifollahi, Teymoor, Bunte, Kerstin

论文摘要

众所周知，紧凑型恒星系统（例如超紧凑型矮人（UCD）和球形簇（GC）周围是星系中的球形簇（GC），是已经形成这些星系的合并事件的示踪剂。因此，识别此类系统可以研究星系质量组装，形成和进化。但是，在缺乏光谱信息的情况下，使用成像数据检测UCDS/GC是非常不确定的。在这里，我们旨在训练机器学习模型，以使用6个过滤器中的Fornax Galaxy群集的多波长成像数据将这些对象与前景星星和背景星系分开，即U，G，R，I，J和KS。对象类别高度不平衡，这对于许多自动分类技术来说是有问题的。因此，我们采用综合少数群体过度采样来处理训练数据的失衡。然后，我们比较两个分类器，即局部的广义基质学习载体量化（LGMLVQ）和随机森林（RF）。两种方法都能够精确地识别UCDS/GC> 93％的召回，并提供相关性，以反映每个特征尺寸％（颜色和角度尺寸）对分类的重要性。两种方法都将角度大小视为此分类问题的重要标记。虽然天文学的期望是U-I和I-K的颜色指数是最重要的颜色，但我们的分析表明，诸如G-R之类的颜色更有信息，可能是由于较高的信噪比。除了出色的性能外，LGMLVQ方法可以通过为每个单独的类别，班级代表样本提供特征的重要性以及对数据可视化数据的可能性。我们得出的结论是，采用机器学习技术来识别UCDS/GC可以带来有希望的结果。

Compact stellar systems such as Ultra-compact dwarfs (UCDs) and Globular Clusters (GCs) around galaxies are known to be the tracers of the merger events that have been forming these galaxies. Therefore, identifying such systems allows to study galaxies mass assembly, formation and evolution. However, in the lack of spectroscopic information detecting UCDs/GCs using imaging data is very uncertain. Here, we aim to train a machine learning model to separate these objects from the foreground stars and background galaxies using the multi-wavelength imaging data of the Fornax galaxy cluster in 6 filters, namely u, g, r, i, J and Ks. The classes of objects are highly imbalanced which is problematic for many automatic classification techniques. Hence, we employ Synthetic Minority Over-sampling to handle the imbalance of the training data. Then, we compare two classifiers, namely Localized Generalized Matrix Learning Vector Quantization (LGMLVQ) and Random Forest (RF). Both methods are able to identify UCDs/GCs with a precision and a recall of >93 percent and provide relevances that reflect the importance of each feature dimension %(colors and angular sizes) for the classification. Both methods detect angular sizes as important markers for this classification problem. While it is astronomical expectation that color indices of u-i and i-Ks are the most important colors, our analysis shows that colors such as g-r are more informative, potentially because of higher signal-to-noise ratio. Besides the excellent performance the LGMLVQ method allows further interpretability by providing the feature importance for each individual class, class-wise representative samples and the possibility for non-linear visualization of the data as demonstrated in this contribution. We conclude that employing machine learning techniques to identify UCDs/GCs can lead to promising results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题