论文标题
NATSA:时间序列分析的接近数据处理加速器
NATSA: A Near-Data Processing Accelerator for Time Series Analysis
论文作者
论文摘要
时间序列分析是提取和预测流行病学,基因组学,神经科学,环境科学,经济学等各种领域中事件的关键技术。矩阵配置文件是执行时间序列分析的最新算法,它计算了切片时间序列中给定查询子序列的最相似子序列。矩阵轮廓具有较低的算术强度,但通常在大量时间序列数据上运行。在当前的计算系统中,需要在片外存储器单元和用于执行矩阵配置文件的片上计算单元之间移动此数据。这会导致主要的性能瓶颈,因为数据移动在执行时间和能源方面都非常昂贵。 在这项工作中,我们提出了NATSA,这是时间序列分析的第一个近数据处理加速器。关键思想是利用现代3D堆叠的高带宽内存(HBM),以在存储器附近启用高效且快速的专业矩阵配置文件计算,其中时间序列数据位于时间序列数据。 NATSA提供了三个关键好处:1)通过构建与HBM接近HBM接近的专业能源有效的浮点算术处理单元,快速计算广泛应用的矩阵配置文件,2)通过减少数据移动的需求,以减少数据运动的需求,并通过在计算机和3号分析中分析速度和3型数据,并分析速度较慢的速度,并分析速度较高的时间。 HBM提供的高带宽和节能内存访问。我们的实验评估表明,在最先进的多核实施中,NATSA的性能最多可提高14.2倍(平均为9.9倍),并将能量降低到27.2倍(平均为19.4倍)。 NATSA还将性能提高6.3倍,并在具有64个固定核的通用NDP平台上降低了10.2倍。
Time series analysis is a key technique for extracting and predicting events in domains as diverse as epidemiology, genomics, neuroscience, environmental sciences, economics, and more. Matrix profile, the state-of-the-art algorithm to perform time series analysis, computes the most similar subsequence for a given query subsequence within a sliced time series. Matrix profile has low arithmetic intensity, but it typically operates on large amounts of time series data. In current computing systems, this data needs to be moved between the off-chip memory units and the on-chip computation units for performing matrix profile. This causes a major performance bottleneck as data movement is extremely costly in terms of both execution time and energy. In this work, we present NATSA, the first Near-Data Processing accelerator for time series analysis. The key idea is to exploit modern 3D-stacked High Bandwidth Memory (HBM) to enable efficient and fast specialized matrix profile computation near memory, where time series data resides. NATSA provides three key benefits: 1) quickly computing the matrix profile for a wide range of applications by building specialized energy-efficient floating-point arithmetic processing units close to HBM, 2) improving the energy efficiency and execution time by reducing the need for data movement over slow and energy-hungry buses between the computation units and the memory units, and 3) analyzing time series data at scale by exploiting low-latency, high-bandwidth, and energy-efficient memory access provided by HBM. Our experimental evaluation shows that NATSA improves performance by up to 14.2x (9.9x on average) and reduces energy by up to 27.2x (19.4x on average), over the state-of-the-art multi-core implementation. NATSA also improves performance by 6.3x and reduces energy by 10.2x over a general-purpose NDP platform with 64 in-order cores.