论文标题
平行和流小波神经网络用于Apache Spark下的分类和回归
Parallel and Streaming Wavelet Neural Networks for Classification and Regression under Apache Spark
论文作者
论文摘要
小波神经网络(WNN)已在许多领域应用于解决回归和分类问题。大数据出现后,随着数据以轻快的速度生成,必须一旦生成,因为数据的性质可能会在短时间间隔发生巨大变化,因此必须立即进行分析。这是必要的,这是大数据全都无处不在,并给数据科学家带来了计算挑战。因此,在本文中,我们构建了一种有效的可扩展的,并行的小波神经网络(SPWNN),该神经网络(SPWNN)采用了平行的随机梯度算法(SGD)算法。 SPWNN是在水平并行化框架中的静态和流环境下设计和开发的。 SPWNN是通过使用Morlet和高斯函数作为激活函数来实现的。这项研究是在诸如气体传感器数据之类的大数据集上进行的,该数据具有超过400万个样本和医学研究数据,该数据具有10,000多个功能,其本质上是高维的。实验分析表明,在静态环境中,具有Morlet激活函数的SPWNN优于分类数据集上的高斯SPWNN。但是,在回归的情况下,观察到了相反的情况。相反,在流媒体环境中,高斯在分类方面的表现优于莫雷特,而莫雷特在回归数据集上的表现优于高斯。总体而言,拟议的SPWNN体系结构的速度为1.32-1.40。
Wavelet neural networks (WNN) have been applied in many fields to solve regression as well as classification problems. After the advent of big data, as data gets generated at a brisk pace, it is imperative to analyze it as soon as it is generated owing to the fact that the nature of the data may change dramatically in short time intervals. This is necessitated by the fact that big data is all pervasive and throws computational challenges for data scientists. Therefore, in this paper, we built an efficient Scalable, Parallelized Wavelet Neural Network (SPWNN) which employs the parallel stochastic gradient algorithm (SGD) algorithm. SPWNN is designed and developed under both static and streaming environments in the horizontal parallelization framework. SPWNN is implemented by using Morlet and Gaussian functions as activation functions. This study is conducted on big datasets like gas sensor data which has more than 4 million samples and medical research data which has more than 10,000 features, which are high dimensional in nature. The experimental analysis indicates that in the static environment, SPWNN with Morlet activation function outperformed SPWNN with Gaussian on the classification datasets. However, in the case of regression, the opposite was observed. In contrast, in the streaming environment i.e., Gaussian outperformed Morlet on the classification and Morlet outperformed Gaussian on the regression datasets. Overall, the proposed SPWNN architecture achieved a speedup of 1.32-1.40.