同时训练，概括更好：基于梯度的minimax学习者的稳定性

论文标题

同时训练，概括更好：基于梯度的minimax学习者的稳定性

Train simultaneously, generalize better: Stability of gradient-based minimax learners

论文作者

Farnia, Farzan, Ozdaglar, Asuman

论文摘要

已经观察到生成对抗网络（GAN）的最小学习问题的成功取决于用于训练的最小值优化算法。这种依赖性通常归因于基础优化算法的收敛速度和鲁棒性。在本文中，我们表明优化算法在训练有素的Minimax模型的概括性能中也起着关键作用。为此，我们分析了标准梯度下降上升（GDA）和近端方法（PPM）算法的泛化特性，这些算法通过算法稳定性的镜头在凸凹面和非convex非Concex非concave minimax设置下。尽管不能保证GDA算法在凸凹问题中具有消失的多余风险，但我们显示PPM算法在同一设置中具有有限的多余风险。对于非convex非concave问题，我们比较随机GDA和GDMAX算法的概括性能，其中后者在每次迭代中都完全解决了最大化的子问题。我们的概括分析表明，GDA的优势规定，最小化和最大化的子问题同时解决了相似的学习率。我们讨论了几个数值结果，表明优化算法在学习最小模型的概括中的作用。

The success of minimax learning problems of generative adversarial networks (GANs) has been observed to depend on the minimax optimization algorithm used for their training. This dependence is commonly attributed to the convergence speed and robustness properties of the underlying optimization algorithm. In this paper, we show that the optimization algorithm also plays a key role in the generalization performance of the trained minimax model. To this end, we analyze the generalization properties of standard gradient descent ascent (GDA) and proximal point method (PPM) algorithms through the lens of algorithmic stability under both convex concave and non-convex non-concave minimax settings. While the GDA algorithm is not guaranteed to have a vanishing excess risk in convex concave problems, we show the PPM algorithm enjoys a bounded excess risk in the same setup. For non-convex non-concave problems, we compare the generalization performance of stochastic GDA and GDmax algorithms where the latter fully solves the maximization subproblem at every iteration. Our generalization analysis suggests the superiority of GDA provided that the minimization and maximization subproblems are solved simultaneously with similar learning rates. We discuss several numerical results indicating the role of optimization algorithms in the generalization of the learned minimax models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题