论文标题

深层网络,近似误差是宽度的宽度到深度平方根的功率

Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth

论文作者

Shen, Zuowei, Yang, Haizhao, Zhang, Shijun

论文摘要

引入了具有超近似功率的新网络。该网络是使用地板($ \ lfloor x \ rfloor $)或relu($ \ max \ {0,x \} $)激活功能在每个神经元中构建的,因此我们称之为网络落地 - relu网络。对于任何超级参数$ n \ in \ mathbb {n}^+$和$ l \ in \ mathbb {n}^+$,显示出具有宽度$ \ max \ max \ max \最大\ {d,\,5n+13 \} $和depth $ 64dl+3 $ 3 $ $ 3 $ $ $ $ $ $的flop-relu网络n.近似错误$3λD^{α/2} n^{ - α\ sqrt {l}} $,其中$α\ in(0,1] $和$λ$分别是hölder订单和恒定的订单和恒定。更一般地。更一般地conterive $ [0,1]^dunture $ [0,1]^d $ cdf $ f $ f. approximation rate is $ω_f(\sqrt{d}\,N^{-\sqrt{L}})+2ω_f(\sqrt{d}){N^{-\sqrt{L}}}$. As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of $ω_f(r)$ as $ r \ to 0 $是中等的(例如,$ω_f(r)\ lyseSim r^α$用于Hölder连续功能),因为在我们的近似速率中要考虑的主要术语本质上是$ \ sqrt {d} $ times $ n $ n $ n $ l $ l $ l Inspecy $ n ynemand $ n y necture nectun $ n $ n necture nectun $ n n $ d $ d y necture n y Nextul unectuus in nectutuus in nectutuus。

A new network with super approximation power is introduced. This network is built with Floor ($\lfloor x\rfloor$) or ReLU ($\max\{0,x\}$) activation function in each neuron and hence we call such networks Floor-ReLU networks. For any hyper-parameters $N\in\mathbb{N}^+$ and $L\in\mathbb{N}^+$, it is shown that Floor-ReLU networks with width $\max\{d,\, 5N+13\}$ and depth $64dL+3$ can uniformly approximate a Hölder function $f$ on $[0,1]^d$ with an approximation error $3λd^{α/2}N^{-α\sqrt{L}}$, where $α\in(0,1]$ and $λ$ are the Hölder order and constant, respectively. More generally for an arbitrary continuous function $f$ on $[0,1]^d$ with a modulus of continuity $ω_f(\cdot)$, the constructive approximation rate is $ω_f(\sqrt{d}\,N^{-\sqrt{L}})+2ω_f(\sqrt{d}){N^{-\sqrt{L}}}$. As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of $ω_f(r)$ as $r\to 0$ is moderate (e.g., $ω_f(r) \lesssim r^α$ for Hölder continuous functions), since the major term to be considered in our approximation rate is essentially $\sqrt{d}$ times a function of $N$ and $L$ independent of $d$ within the modulus of continuity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源