用树张量网络学习：复杂性估计和模型选择

论文标题

用树张量网络学习：复杂性估计和模型选择

Learning with tree tensor networks: complexity estimates and model selection

论文作者

Michel, Bertrand, Nouy, Anthony

论文摘要

树木张量网络或基于树的张量格式是用于计算和数据科学中高维函数近似的突出模型类。它们对应于与维度树相关的稀疏连通性和张量级元素给出的宽度的稀疏连接性。这些模型的近似能力已被证明是（接近）对于经典平滑度类别的最佳选择。但是，在经验风险最小化的框架中，观测值数量有限，应仔细选择维度树和等级，以平衡估计和近似错误。我们在经验风险最小化框架中提出和分析了基于复杂性的树型网络的基于复杂性的模型选择方法，并分析了其在广泛的平滑度类别中的性能。鉴于与不同树木，等级，张量产品功能空间和稀疏网络网络的稀疏模式相关的模型类别，选择了模型（àlaBarron，Birgé，Massart，Massart），通过最小化受惩罚的经验风险，并根据模型类别的复杂性和从估算的估算中的损失损失的经验风险，并获得惩罚。这种罚款的选择会产生所选预测因子的风险。在最小二乘的设置中，在得出了风险的快速收敛速度之后，我们表明我们的策略（接近）最小值适应于广泛的平滑度类别，包括Sobolev或BESOV空间（具有各向异性，各向异性或混合占主导地位的平滑度）和分析功能。我们讨论了张量网络在几种制度中获得最佳性能的稀疏性的作用。实际上，用斜率启发式方法对惩罚的幅度进行了校准。在最小二乘回归设置中的数值实验说明了策略的性能。

Tree tensor networks, or tree-based tensor formats, are prominent model classes for the approximation of high-dimensional functions in computational and data science. They correspond to sum-product neural networks with a sparse connectivity associated with a dimension tree and widths given by a tuple of tensor ranks. The approximation power of these models has been proved to be (near to) optimal for classical smoothness classes. However, in an empirical risk minimization framework with a limited number of observations, the dimension tree and ranks should be selected carefully to balance estimation and approximation errors. We propose and analyze a complexity-based model selection method for tree tensor networks in an empirical risk minimization framework and we analyze its performance over a wide range of smoothness classes. Given a family of model classes associated with different trees, ranks, tensor product feature spaces and sparsity patterns for sparse tensor networks, a model is selected (à la Barron, Birgé, Massart) by minimizing a penalized empirical risk, with a penalty depending on the complexity of the model class and derived from estimates of the metric entropy of tree tensor networks. This choice of penalty yields a risk bound for the selected predictor. In a least-squares setting, after deriving fast rates of convergence of the risk, we show that our strategy is (near to) minimax adaptive to a wide range of smoothness classes including Sobolev or Besov spaces (with isotropic, anisotropic or mixed dominating smoothness) and analytic functions. We discuss the role of sparsity of the tensor network for obtaining optimal performance in several regimes. In practice, the amplitude of the penalty is calibrated with a slope heuristics method. Numerical experiments in a least-squares regression setting illustrate the performance of the strategy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题