DRNA：Dirichlet神经建筑搜索

论文标题

DRNA：Dirichlet神经建筑搜索

DrNAS: Dirichlet Neural Architecture Search

论文作者

Chen, Xiangning, Wang, Ruochen, Cheng, Minhao, Tang, Xiaocheng, Hsieh, Cho-Jui

论文摘要

本文提出了一种新颖的可区分架构搜索方法，通过将其制定为分配学习问题。我们将连续放松的结构混合重量视为随机变量，由Dirichlet分布建模。借助最近开发的路径衍生物，Dirichlet参数可以以端到端的方式轻松地通过基于梯度的优化器来优化。这种表述提高了概括能力，并引起随机性自然地鼓励搜索空间中的探索。此外，为了减轻可微分NA的大量记忆消耗，我们提出了一种简单而有效的渐进学习计划，可以直接搜索大规模的任务，从而消除了搜索阶段和评估阶段之间的差距。广泛的实验证明了我们方法的有效性。具体而言，我们获得CIFAR-10的测试误差为2.46％，在移动设置下的Imagenet的测试误差为23.7％。在NAS Bench-2011中，我们还可以在所有三个数据集上实现最先进的结果，并为有效设计神经体系结构搜索算法提供见解。

This paper proposes a novel differentiable architecture search method by formulating it into a distribution learning problem. We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution. With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based optimizer in an end-to-end manner. This formulation improves the generalization ability and induces stochasticity that naturally encourages exploration in the search space. Furthermore, to alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme that enables searching directly on large-scale tasks, eliminating the gap between search and evaluation phases. Extensive experiments demonstrate the effectiveness of our method. Specifically, we obtain a test error of 2.46% for CIFAR-10, 23.7% for ImageNet under the mobile setting. On NAS-Bench-201, we also achieve state-of-the-art results on all three datasets and provide insights for the effective design of neural architecture search algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题