论文标题
G-Layers神经网络的事后校准
Post-hoc Calibration of Neural Networks by g-Layers
论文作者
论文摘要
在将机器学习模型纳入现实世界决策系统中时,神经网络的校准是一个关键的方面,在这些系统中,决策的信心与决策本身同样重要。近年来,关于神经网络校准的研究激增,大多数作品可以归类为事后校准方法,这些方法定义为学习的方法,这些方法学习了一种额外的功能来校准已经训练的基本网络。在这项工作中,我们打算从理论的角度理解事后校准方法。尤其是,众所周知,如果达到全球最佳距离,则最大程度地减少负模样(NLL)将导致训练集的校准网络(Bishop,1994)。然而,尚不清楚以事后方式学习其他功能会导致理论意义上的校准。为此,我们证明,即使基本网络($ f $)并不能通过优化$ g $的参数来获得校准的网络$ g \ circ f $来最大程度地减少NLL,即使基本网络($ f $)并不能通过添加其他层($ g $)来降低NLL的全局最佳。这不仅提供了获得校准网络的严格条件,还提供了事后校准方法的理论理由。我们对各种图像分类基准的实验证实了这一理论。
Calibration of neural networks is a critical aspect to consider when incorporating machine learning models in real-world decision-making systems where the confidence of decisions are equally important as the decisions themselves. In recent years, there is a surge of research on neural network calibration and the majority of the works can be categorized into post-hoc calibration methods, defined as methods that learn an additional function to calibrate an already trained base network. In this work, we intend to understand the post-hoc calibration methods from a theoretical point of view. Especially, it is known that minimizing Negative Log-Likelihood (NLL) will lead to a calibrated network on the training set if the global optimum is attained (Bishop, 1994). Nevertheless, it is not clear learning an additional function in a post-hoc manner would lead to calibration in the theoretical sense. To this end, we prove that even though the base network ($f$) does not lead to the global optimum of NLL, by adding additional layers ($g$) and minimizing NLL by optimizing the parameters of $g$ one can obtain a calibrated network $g \circ f$. This not only provides a less stringent condition to obtain a calibrated network but also provides a theoretical justification of post-hoc calibration methods. Our experiments on various image classification benchmarks confirm the theory.