论文标题

迈向大型内核机的统一正交框架

Towards a Unified Quadrature Framework for Large-Scale Kernel Machines

论文作者

Liu, Fanghui, Huang, Xiaolin, Chen, Yudong, Suykens, Johan A. K.

论文摘要

在本文中,我们通过数值集成表示为大规模内核机开发一个正交框架。考虑到典型内核的集成域和度量,例如高斯内核,Arc-Cosine内核是完全对称的,我们利用确定性的完全对称性插值规则来有效计算核近似值的正交节点和相关权重。开发的插入规则能够减少所需的节点的数量,同时保持高近似精度。此外,我们通过经典的蒙特卡洛采样和控制变体具有两个优点的技术来对上述确定性规则进行随机确定性规则:1)提议的随机规则使特征映射的尺寸灵活地变化,从而使我们可以通过调谐尺寸来控制原始和近似内核之间的差异。 2)我们的随机规则具有良好的统计特性,其无偏见和差异降低,并具有快速收敛速度。此外,我们阐明了我们的确定性/随机插值规则与当前核近似正交规则之间的关系,包括稀疏的网格正交和随机球形式规则,从而在我们的框架下统一了这些方法。几个基准数据集的实验结果表明,我们的方法与基于其他代表性内核近似方法相比有利。

In this paper, we develop a quadrature framework for large-scale kernel machines via a numerical integration representation. Considering that the integration domain and measure of typical kernels, e.g., Gaussian kernels, arc-cosine kernels, are fully symmetric, we leverage deterministic fully symmetric interpolatory rules to efficiently compute quadrature nodes and associated weights for kernel approximation. The developed interpolatory rules are able to reduce the number of needed nodes while retaining a high approximation accuracy. Further, we randomize the above deterministic rules by the classical Monte-Carlo sampling and control variates techniques with two merits: 1) The proposed stochastic rules make the dimension of the feature mapping flexibly varying, such that we can control the discrepancy between the original and approximate kernels by tuning the dimnension. 2) Our stochastic rules have nice statistical properties of unbiasedness and variance reduction with fast convergence rate. In addition, we elucidate the relationship between our deterministic/stochastic interpolatory rules and current quadrature rules for kernel approximation, including the sparse grids quadrature and stochastic spherical-radial rules, thereby unifying these methods under our framework. Experimental results on several benchmark datasets show that our methods compare favorably with other representative kernel approximation based methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源