论文标题
为大规模学习的素描数据集(长版)
Sketching Datasets for Large-Scale Learning (long version)
论文作者
论文摘要
本文考虑了“压缩学习”,这是一种大规模机器学习的方法,在学习之前,数据集对数据集进行了大规模压缩(例如,聚类,分类或回归)。特别是,首先通过计算精心选择的非线性随机特征(例如随机傅立叶特征)并在整个数据集中平均它们来构建“草图”。然后从草图中学习参数,而无需访问原始数据集。本文调查了压缩学习的当前最新技术,包括主要概念和算法,与既定的信号处理方法,现有的理论保证的联系 - 有关信息保存和隐私保护以及重要的开放问题。
This article considers "compressive learning," an approach to large-scale machine learning where datasets are massively compressed before learning (e.g., clustering, classification, or regression) is performed. In particular, a "sketch" is first constructed by computing carefully chosen nonlinear random features (e.g., random Fourier features) and averaging them over the whole dataset. Parameters are then learned from the sketch, without access to the original dataset. This article surveys the current state-of-the-art in compressive learning, including the main concepts and algorithms, their connections with established signal-processing methods, existing theoretical guarantees -- on both information preservation and privacy preservation, and important open problems.