论文标题

使用Graphblas的数​​据包的Hypersparse网络流量分析

Hypersparse Network Flow Analysis of Packets with GraphBLAS

论文作者

Trigg, Tyler, Meiners, Chad, Pisharody, Sandeep, Jananthan, Hayden, Jones, Michael, Michaleas, Adam, Davis, Timothy, Welch, Erik, Arcand, William, Bestor, David, Bergeron, William, Byun, Chansup, Gadepally, Vijay, Houle, Micheal, Hubbell, Matthew, Klein, Anna, Michaleas, Peter, Milechin, Lauren, Mullen, Julie, Prout, Andrew, Reuther, Albert, Rosa, Antonio, Samsi, Siddharth, Stetson, Doug, Yee, Charles, Kepner, Jeremy

论文摘要

由于网络流量的数量和速率,互联网分析是一个主要的挑战。代替将流量分析为原始数据包,网络分析师通常依靠包含开始时间,停止时间,源,目的地和每个方向的数据包数量的压缩网络流(NetFlow)。但是,许多流量分析受益于多个同时Netfrows的时间聚集,这在计算上可能具有挑战性。为了减轻这种关注,已经开发了一种新型的NetFlow压缩和重采样方法,利用Graphblas Hyperspace交通矩阵,该矩阵在实现子量值分析的同时保留匿名化。然后在每个子范围上对标准的多阶段空间分析进行,以生成源数据包的详细统计聚合物,源粉丝范围,唯一链接,目的地风扇,以及每个子范围的目标数据包,然后可用于背景建模和异常检测。开发了基于Graphblas稀疏矩阵的简单文件格式,用于存储这些统计聚合物。该方法是在MIT SuperCloud上测试的,该方法使用了几个月内收集的数百个地点的50万亿数据包NetFlow语料库进行了测试。所得的压缩是显着的(每个数据包<0.1位),可以存储和运输极大的NetFlow分析。通过处理器和线程分析单个节点并行性能,表明单个节点可以以超过一百万的数据包/秒(大致相当于10千兆位链接)执行数百个同时分析。

Internet analysis is a major challenge due to the volume and rate of network traffic. In lieu of analyzing traffic as raw packets, network analysts often rely on compressed network flows (netflows) that contain the start time, stop time, source, destination, and number of packets in each direction. However, many traffic analyses benefit from temporal aggregation of multiple simultaneous netflows, which can be computationally challenging. To alleviate this concern, a novel netflow compression and resampling method has been developed leveraging GraphBLAS hyperspace traffic matrices that preserve anonymization while enabling subrange analysis. Standard multitemporal spatial analyses are then performed on each subrange to generate detailed statistical aggregates of the source packets, source fan-out, unique links, destination fan-in, and destination packets of each subrange which can then be used for background modeling and anomaly detection. A simple file format based on GraphBLAS sparse matrices is developed for storing these statistical aggregates. This method is scale tested on the MIT SuperCloud using a 50 trillion packet netflow corpus from several hundred sites collected over several months. The resulting compression achieved is significant (<0.1 bit per packet) enabling extremely large netflow analyses to be stored and transported. The single node parallel performance is analyzed in terms of both processors and threads showing that a single node can perform hundreds of simultaneous analyses at over a million packets/sec (roughly equivalent to a 10 Gigabit link).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源