论文标题
从$ l_p^q- $规范术语中的共识函数,用于在神经网络中用作自适应激活功能
Consensus Function from an $L_p^q-$norm Regularization Term for its Use as Adaptive Activation Functions in Neural Networks
论文作者
论文摘要
神经网络的设计通常是通过定义层数,每层神经元的数量,连接或突触的数量以及它们将执行的激活函数来实现的。培训过程试图优化分配给这些连接的权重,以及神经元的偏见,以更好地适合训练数据。但是,激活功能的定义通常是在设计过程中确定的,而在培训期间未修改,这意味着它们的行为与培训数据集无关。在本文中,我们提出了在训练过程中适应其形状的隐式,参数,非线性激活函数的定义和利用。这一事实增加了参数的空间以在网络中进行优化,但是它允许更大的灵活性并推广神经网络的概念。此外,它简化了架构设计,因为可以在每个神经元中使用相同的激活功能定义,从而使训练过程优化其参数,从而优化其行为。我们提出的激活函数来自通过$ L_P^Q $正则化项的线性不确定问题的优化,通过乘数的交替方向方法(ADMM)来定义共识变量。我们使用这种类型的激活功能将神经网络定义为$ PQ-$网络。初步结果表明,与具有固定激活函数的等效的常规前馈神经网络相比,与等效的常规前馈神经网络相比,这些神经网络与这种自适应激活功能的使用可减少回归和分类示例的误差。
The design of a neural network is usually carried out by defining the number of layers, the number of neurons per layer, their connections or synapses, and the activation function that they will execute. The training process tries to optimize the weights assigned to those connections, together with the biases of the neurons, to better fit the training data. However, the definition of the activation functions is, in general, determined in the design process and not modified during the training, meaning that their behavior is unrelated to the training data set. In this paper we propose the definition and utilization of an implicit, parametric, non-linear activation function that adapts its shape during the training process. This fact increases the space of parameters to optimize within the network, but it allows a greater flexibility and generalizes the concept of neural networks. Furthermore, it simplifies the architectural design since the same activation function definition can be employed in each neuron, letting the training process to optimize their parameters and, thus, their behavior. Our proposed activation function comes from the definition of the consensus variable from the optimization of a linear underdetermined problem with an $L_p^q$ regularization term, via the Alternating Direction Method of Multipliers (ADMM). We define the neural networks using this type of activation functions as $pq-$networks. Preliminary results show that the use of these neural networks with this type of adaptive activation functions reduces the error in regression and classification examples, compared to equivalent regular feedforward neural networks with fixed activation functions.