从其建筑特征中推断出卷积神经网络的精度

论文标题

从其建筑特征中推断出卷积神经网络的精度

Inferring Convolutional Neural Networks' accuracies from their architectural characterizations

论文作者

Hoang, Duc, Hamer, Jesse, Perdue, Gabriel N., Young, Steven R., Miller, Jonathan, Ghosh, Anushree

论文摘要

卷积神经网络（CNN）对分析来自包括粒子成像探测器在内的许多领域的科学数据表现出很大的希望。但是，为特定应用程序和不同数据集选择适当的网络体系结构（深度，内核形状，激活功能等）的挑战仍然很少了解。在本文中，我们通过提出一种系统的语言来研究CNN的体系结构与其性能之间的关系，该语言在训练时间之前对不同CNN的体系结构进行比较非常有用。我们通过不同的属性来表征CNN的体系结构，并证明该属性可以预测网络在两个基于计算机视觉的物理问题中的性能 - 事件顶点查找和在费米国家加速器实验中的Minerva实验中的事件顶点查找和强子多重性分类。在此过程中，我们从优化网络的物理问题架构中提取几个架构属性，这些属性是模型选择算法的输出，称为多节点进化神经网络，用于深度学习（MENNDL）。我们使用机器学习模型来预测网络在训练之前是否可以比某个阈值精度更好。这些模型的效果比随机猜测要好16-20％。此外，我们发现在大量网络中的准确性回归中，普通最小二乘模型的确定为0.966的系数。

Convolutional Neural Networks (CNNs) have shown strong promise for analyzing scientific data from many domains including particle imaging detectors. However, the challenge of choosing the appropriate network architecture (depth, kernel shapes, activation functions, etc.) for specific applications and different data sets is still poorly understood. In this paper, we study the relationships between a CNN's architecture and its performance by proposing a systematic language that is useful for comparison between different CNN's architectures before training time. We characterize CNN's architecture by different attributes, and demonstrate that the attributes can be predictive of the networks' performance in two specific computer vision-based physics problems -- event vertex finding and hadron multiplicity classification in the MINERvA experiment at Fermi National Accelerator Laboratory. In doing so, we extract several architectural attributes from optimized networks' architecture for the physics problems, which are outputs of a model selection algorithm called Multi-node Evolutionary Neural Networks for Deep Learning (MENNDL). We use machine learning models to predict whether a network can perform better than a certain threshold accuracy before training. The models perform 16-20% better than random guessing. Additionally, we found an coefficient of determination of 0.966 for an Ordinary Least Squares model in a regression on accuracy over a large population of networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题