多目标神经体系结构通过非平稳政策梯度搜索

论文标题

多目标神经体系结构通过非平稳政策梯度搜索

Multi-objective Neural Architecture Search via Non-stationary Policy Gradient

论文作者

Chen, Zewei, Zhou, Fengwei, Trimponias, George, Li, Zhenguo

论文摘要

多目标神经体系结构搜索（NAS）旨在在存在多个冲突目标的情况下发现新颖的体系结构。尽管最近取得了进展，但准确有效地近似帕累托阵线的问题仍然具有挑战性。在这项工作中，我们探索了基于非平稳政策梯度（NPG）的新型强化学习（RL）范式。 NPG利用了非平稳奖励功能，并鼓励对政策的持续适应有效地捕获整个帕累托前锋。我们介绍了两个新颖的奖励功能，并具有来自标量和进化的主要范式的要素。为了处理非平稳性，我们提出了一种使用温暖重新开始的余弦温度衰减的新探索方案。对于快速准确的体系结构评估，我们引入了一种新颖的预培训共享模型，我们在整个培训过程中都不断微调。我们对各种数据集进行的广泛的实验研究表明，我们的框架可以以快速的速度很好地近似帕累托阵线。此外，与其他多目标NAS方法相比，我们发现的细胞可以实现最高的预测性能，以及类似网络大小的其他单目标NAS方法。我们的工作证明了NPG作为多目标NAS的简单，高效且有效的范式的潜力。

Multi-objective Neural Architecture Search (NAS) aims to discover novel architectures in the presence of multiple conflicting objectives. Despite recent progress, the problem of approximating the full Pareto front accurately and efficiently remains challenging. In this work, we explore the novel reinforcement learning (RL) based paradigm of non-stationary policy gradient (NPG). NPG utilizes a non-stationary reward function, and encourages a continuous adaptation of the policy to capture the entire Pareto front efficiently. We introduce two novel reward functions with elements from the dominant paradigms of scalarization and evolution. To handle non-stationarity, we propose a new exploration scheme using cosine temperature decay with warm restarts. For fast and accurate architecture evaluation, we introduce a novel pre-trained shared model that we continuously fine-tune throughout training. Our extensive experimental study with various datasets shows that our framework can approximate the full Pareto front well at fast speeds. Moreover, our discovered cells can achieve supreme predictive performance compared to other multi-objective NAS methods, and other single-objective NAS methods at similar network sizes. Our work demonstrates the potential of NPG as a simple, efficient, and effective paradigm for multi-objective NAS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题