神经建筑搜索的几何感知梯度算法

论文标题

神经建筑搜索的几何感知梯度算法

Geometry-Aware Gradient Algorithms for Neural Architecture Search

论文作者

Li, Liam, Khodak, Mikhail, Balcan, Maria-Florina, Talwalkar, Ameet

论文摘要

神经体系结构搜索（NAS）的最新最新方法通过将问题放松为对体系结构和共享权重的连续优化，利用基于梯度的优化，这是一个嘈杂的过程，这一过程仍然很糟糕。我们主张研究单级经验风险最小化，以通过体重分享来了解NA，从而将NAS方法的设计减少到设计优化者和正则化器，这些方法可以快速获得该问题的高质量解决方案。我们提出了一个几何感知的框架，该框架利用了这种优化的基础结构来返回稀疏的建筑参数，从而导致简单而新颖的算法，可享受快速收敛并在计算机视觉中获得最新的NAS基准标准。值得注意的是，我们超过了DARTS搜索空间和NAS-Bench201上CIFAR和Imagenet的最佳发布结果。在后者上，我们在CIFAR-10和CIFAR-100上实现了几乎最佳的性能。我们的理论和实验共同展示了一种原则性的方法，用于共同设计优化者以及连续放松离散的NAS搜索空间。

Recent state-of-the-art methods for neural architecture search (NAS) exploit gradient-based optimization by relaxing the problem into continuous optimization over architectures and shared-weights, a noisy process that remains poorly understood. We argue for the study of single-level empirical risk minimization to understand NAS with weight-sharing, reducing the design of NAS methods to devising optimizers and regularizers that can quickly obtain high-quality solutions to this problem. Invoking the theory of mirror descent, we present a geometry-aware framework that exploits the underlying structure of this optimization to return sparse architectural parameters, leading to simple yet novel algorithms that enjoy fast convergence guarantees and achieve state-of-the-art accuracy on the latest NAS benchmarks in computer vision. Notably, we exceed the best published results for both CIFAR and ImageNet on both the DARTS search space and NAS-Bench201; on the latter we achieve near-oracle-optimal performance on CIFAR-10 and CIFAR-100. Together, our theory and experiments demonstrate a principled way to co-design optimizers and continuous relaxations of discrete NAS search spaces.

下载PDF全文

下载文献需遵守相关版权规定

论文标题