ista-nas：通过稀疏编码来高效且一致的神经体系结构搜索

论文标题

ista-nas：通过稀疏编码来高效且一致的神经体系结构搜索

ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding

论文作者

Yang, Yibo, Li, Hongyang, You, Shan, Wang, Fei, Qian, Chen, Lin, Zhouchen

论文摘要

神经体系结构搜索（NAS）旨在从所有候选连接所跨越的高维空间中产生最佳的稀疏解决方案。当前基于梯度的NAS方法通常会忽略搜索阶段的稀疏性约束，但是通过后处理将优化的解决方案投射到稀疏的解决方案上。结果，搜索的密集超级网络效率低下，并且与预计的体系结构有差距进行评估。在本文中，我们将神经体系结构搜索作为稀疏编码问题。我们在压缩的低维空间上执行可区分的搜索，该搜索具有与原始稀疏解决方案空间相同的验证损失，并通过解决稀疏编码问题来恢复体系结构。可区分的搜索和体系结构恢复以另一种方式优化。通过这样做，我们在每个更新中进行搜索的网络可以满足稀疏性约束，并且可以有效训练。为了消除搜索网络与评估目标网络之间的深度和宽度差距，我们进一步提出了一种在目标网络设置下在一个阶段进行搜索和评估的方法。当培训完成时，体系结构变量被吸收到网络重量中。因此，我们在一次运行中获得了搜索的体系结构和优化参数。在实验中，我们在CIFAR-10上的两阶段方法仅需要0.05 GPU日才能进行搜索。我们的一个阶段方法在CIFAR-10和Imagenet上产生最先进的表演，仅需评估时间。

Neural architecture search (NAS) aims to produce the optimal sparse solution from a high-dimensional space spanned by all candidate connections. Current gradient-based NAS methods commonly ignore the constraint of sparsity in the search phase, but project the optimized solution onto a sparse one by post-processing. As a result, the dense super-net for search is inefficient to train and has a gap with the projected architecture for evaluation. In this paper, we formulate neural architecture search as a sparse coding problem. We perform the differentiable search on a compressed lower-dimensional space that has the same validation loss as the original sparse solution space, and recover an architecture by solving the sparse coding problem. The differentiable search and architecture recovery are optimized in an alternate manner. By doing so, our network for search at each update satisfies the sparsity constraint and is efficient to train. In order to also eliminate the depth and width gap between the network in search and the target-net in evaluation, we further propose a method to search and evaluate in one stage under the target-net settings. When training finishes, architecture variables are absorbed into network weights. Thus we get the searched architecture and optimized parameters in a single run. In experiments, our two-stage method on CIFAR-10 requires only 0.05 GPU-day for search. Our one-stage method produces state-of-the-art performances on both CIFAR-10 and ImageNet at the cost of only evaluation time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题