论文标题

通过半混凝土最佳运输的半污点优化:神经体系结构搜索框架

Semi-discrete optimization through semi-discrete optimal transport: a framework for neural architecture search

论文作者

Trillos, Nicolas Garcia, Morales, Javier

论文摘要

在本文中,我们使用最佳运输中的想法介绍了一个半混凝土优化的理论框架。我们的主要动机是在深度学习领域,特别是在神经体系结构搜索的任务中。考虑到这个目标,我们讨论了神经建筑搜索新技术的几何和理论动机(在同伴论文中,我们表明,受我们框架启发的算法具有同时的方法竞争)。我们在半混凝土空间$ \ mathbb {r}^d \ times \ times \ mathcal {g} $上引入了一个类似Riemannian的指标,其中$ \ Mathcal {g} $是有限的加权图。借助这种riemmanian的结构,我们为相对熵功能的梯度流以及二阶动力学得出形式表达式,以优化上述能量。然后,为了为正式得出的梯度流程方程提供严格的动机,我们还考虑了一种被称为最小化运动方案(即隐式Euler方案或JKO方案)的迭代程序,并将其应用于相对熵,相对于合适的成本功能。对于某些特定的度量和成本选择,我们严格地表明,相对熵功能的最小化运动方案会收敛到正式的Riemannian结构提供的梯度流动过程。该流与$ \ Mathbb {r}^d $上的反应扩散方程系统相吻合。

In this paper we introduce a theoretical framework for semi-discrete optimization using ideas from optimal transport. Our primary motivation is in the field of deep learning, and specifically in the task of neural architecture search. With this aim in mind, we discuss the geometric and theoretical motivation for new techniques for neural architecture search (in a companion paper we show that algorithms inspired by our framework are competitive with contemporaneous methods). We introduce a Riemannian-like metric on the space of probability measures over a semi-discrete space $\mathbb{R}^d \times \mathcal{G}$ where $\mathcal{G}$ is a finite weighted graph. With such Riemmanian structure in hand, we derive formal expressions for the gradient flow of a relative entropy functional, as well as second order dynamics for the optimization of said energy. Then, with the aim of providing a rigorous motivation for the gradient flow equations derived formally, we also consider an iterative procedure known as minimizing movement scheme (i.e., Implicit Euler scheme, or JKO scheme) and apply it to the relative entropy with respect to a suitable cost function. For some specific choices of metric and cost, we rigorously show that the minimizing movement scheme of the relative entropy functional converges to the gradient flow process provided by the formal Riemannian structure. This flow coincides with a system of reaction-diffusion equations on $\mathbb{R}^d$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源