RARTS：有效的一阶放松体系结构搜索方法

论文标题

RARTS：有效的一阶放松体系结构搜索方法

RARTS: An Efficient First-Order Relaxed Architecture Search Method

论文作者

Xue, Fanghui, Qi, Yingyong, Xin, Jack

论文摘要

可区分的体系结构搜索（飞镖）是基于解决双重优化问题的数据驱动神经网络设计的有效方法。尽管在许多体系结构搜索任务中取得了成功，但仍然担心一阶飞镖的准确性和二阶飞镖的效率。在本文中，我们制定了单个级别的替代方案和放松的体系结构搜索（RARTS）方法，该方法通过数据和网络拆分利用整个数据集在体系结构学习中，而无需涉及相应损失函数（如飞镖）的混合第二个衍生物。在我们制定网络拆分的过程中，两个具有不同但相关权重的网络在寻找共享体系结构时进行了合作。 RART比飞镖的优势通过收敛定理和可解析的模型证明是合理的。此外，RART的表现优于飞镖及其在准确性和搜索效率方面的变体，如足够的实验结果所示。对于搜索拓扑结构（即边缘和操作）的任务，RART获得了比CIFAR-10上的二阶Darts获得更高的精度和60 \％的计算成本。 RART转移到ImageNet后继续表现出色，并且与最近的飞镖变体相提并论，尽管我们的创新纯粹是在训练算法上而没有修改搜索空间的训练算法。对于搜索宽度的任务，即卷积层中的频道数量，RARTS还优于传统的网络修剪基准。关于公共体系结构搜索基准（例如Nats Bench）的进一步实验也支持RARTS的优势。

Differentiable architecture search (DARTS) is an effective method for data-driven neural network design based on solving a bilevel optimization problem. Despite its success in many architecture search tasks, there are still some concerns about the accuracy of first-order DARTS and the efficiency of the second-order DARTS. In this paper, we formulate a single level alternative and a relaxed architecture search (RARTS) method that utilizes the whole dataset in architecture learning via both data and network splitting, without involving mixed second derivatives of the corresponding loss functions like DARTS. In our formulation of network splitting, two networks with different but related weights cooperate in search of a shared architecture. The advantage of RARTS over DARTS is justified by a convergence theorem and an analytically solvable model. Moreover, RARTS outperforms DARTS and its variants in accuracy and search efficiency, as shown in adequate experimental results. For the task of searching topological architecture, i.e., the edges and the operations, RARTS obtains a higher accuracy and 60\% reduction of computational cost than second-order DARTS on CIFAR-10. RARTS continues to out-perform DARTS upon transfer to ImageNet and is on par with recent variants of DARTS even though our innovation is purely on the training algorithm without modifying search space. For the task of searching width, i.e., the number of channels in convolutional layers, RARTS also outperforms the traditional network pruning benchmarks. Further experiments on the public architecture search benchmark like NATS-Bench also support the preeminence of RARTS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题