论文标题
太阳能预测的机器学习方法:复杂吗?
Machine Learning Approaches to Solar-Flare Forecasting: Is Complex Better?
论文作者
论文摘要
最近,人们对使用机器学习方法预测太阳耀斑的兴趣越来越大。沿着这些线路的初步努力采用了相对简单的模型,将从黑子活跃区域观察结果提取的特征与已知的耀斑实例相关联。通常,这些模型使用了由专家仔细选择的物理启发的功能,以捕获这种磁场结构的显着特征。随着时间的流逝,所涉及模型的复杂性和复杂性不断增长。但是,选择功能集的选择几乎没有发展,也没有任何系统的研究对附加模型的复杂性是否真正有用。我们的目标是解决这些问题。为此,我们比较具有不同程度复杂性的基于机器学习的,基于机器的耀斑遗产模型的相对预测性能。我们还使用拓扑数据分析重新访问了功能集设计,以从活动区域的磁场图像中提取基于形状的特征。使用超参数训练在不同特征集的不同机器学习模型中进行公平比较,我们表明,具有更少的自由参数\ textIt {通常性能要好于更复杂的模型}的简单模型,即,强大的机械不一定可以保证更好的预测性能。其次,我们发现\ textIt {摘要,基于形状的功能包含了同样有用的信息},出于耀斑预测的目的,与多年来太阳能物理社区开发的手工制作的功能相比。最后,我们使用主成分分析研究了降低维度的效果,以表明精简的特征集总体上的性能以及相应的全维版本。
Recently, there has been growing interest in the use of machine-learning methods for predicting solar flares. Initial efforts along these lines employed comparatively simple models, correlating features extracted from observations of sunspot active regions with known instances of flaring. Typically, these models have used physics-inspired features that have been carefully chosen by experts in order to capture the salient features of such magnetic field structures. Over time, the sophistication and complexity of the models involved has grown. However, there has been little evolution in the choice of feature sets, nor any systematic study of whether the additional model complexity is truly useful. Our goal is to address these issues. To that end, we compare the relative prediction performance of machine-learning-based, flare-forecasting models with varying degrees of complexity. We also revisit the feature set design, using topological data analysis to extract shape-based features from magnetic field images of the active regions. Using hyperparameter training for fair comparison of different machine-learning models across different feature sets, we show that simpler models with fewer free parameters \textit{generally perform better than more-complicated models}, ie., powerful machinery does not necessarily guarantee better prediction performance. Secondly, we find that \textit{abstract, shape-based features contain just as much useful information}, for the purposes of flare prediction, as the set of hand-crafted features developed by the solar-physics community over the years. Finally, we study the effects of dimensionality reduction, using principal component analysis, to show that streamlined feature sets, overall, perform just as well as the corresponding full-dimensional versions.