论文标题
基于学习的数据存储[Vision](技术报告)
Learning-Based Data Storage [Vision] (Technical Report)
论文作者
论文摘要
深度神经网络(DNN)及其变体已被广泛用于多种真实应用,例如图像分类,面部/语音识别,欺诈检测等。除了许多重要的机器学习任务外,随着人造网络模仿脑细胞的运作方式,DNN还显示了在输入和输出数据之间存储非线性关系的能力,这表现出通过DNN存储数据的潜力。我们设想了一个新的数据存储范式“ DNN-AS-A-DATABASE”,其中数据是在训练有素的机器学习模型中编码的。与直接以原始格式记录数据的传统数据存储相比,基于学习的结构(例如DNN)可以隐式编码输入和输出的数据对,并仅在提供输入数据时,才能计算/实现不同分辨率的实际输出数据。这种新的范式可以通过在不同级别上的灵活数据隐私设置,通过新硬件的加速(例如衍射神经网络和AI芯片)来大大提高数据安全性,并可以推广到分布式DNN基于DNN的存储/计算。在本文中,我们提出了这个基于学习的数据存储的新颖概念,该概念利用一种名为基于学习的记忆单元(LMU)的学习结构来存储,组织和检索数据。作为案例研究,我们将DNN用作LMU中的发动机,并研究基于DNN的数据存储的数据容量和准确性。我们的初步实验结果表明,通过达到DNN存储的高(100%)精度,基于学习的数据存储的可行性。我们探索和设计有效的解决方案,以利用基于DNN的数据存储来管理和查询关系表。我们讨论如何将解决方案推广到其他数据类型(例如图形)和分布式DNN存储/计算等环境。
Deep neural network (DNN) and its variants have been extensively used for a wide spectrum of real applications such as image classification, face/speech recognition, fraud detection, and so on. In addition to many important machine learning tasks, as artificial networks emulating the way brain cells function, DNNs also show the capability of storing non-linear relationships between input and output data, which exhibits the potential of storing data via DNNs. We envision a new paradigm of data storage, "DNN-as-a-Database", where data are encoded in well-trained machine learning models. Compared with conventional data storage that directly records data in raw formats, learning-based structures (e.g., DNN) can implicitly encode data pairs of inputs and outputs and compute/materialize actual output data of different resolutions only if input data are provided. This new paradigm can greatly enhance the data security by allowing flexible data privacy settings on different levels, achieve low space consumption and fast computation with the acceleration of new hardware (e.g., Diffractive Neural Network and AI chips), and can be generalized to distributed DNN-based storage/computing. In this paper, we propose this novel concept of learning-based data storage, which utilizes a learning structure called learning-based memory unit (LMU), to store, organize, and retrieve data. As a case study, we use DNNs as the engine in the LMU, and study the data capacity and accuracy of the DNN-based data storage. Our preliminary experimental results show the feasibility of the learning-based data storage by achieving high (100%) accuracy of the DNN storage. We explore and design effective solutions to utilize the DNN-based data storage to manage and query relational tables. We discuss how to generalize our solutions to other data types (e.g., graphs) and environments such as distributed DNN storage/computing.