通过潜在功能表示学习检测基于网络的Internet审查

论文标题

通过潜在功能表示学习检测基于网络的Internet审查

Detecting Network-based Internet Censorship via Latent Feature Representation Learning

论文作者

Duncan, Shawn P., Chen, Hui

论文摘要

互联网审查制度是社会重要性的现象，并吸引了来自多个学科的调查。几个研究小组，例如审查的星球，已经部署了大规模的互联网测量平台来收集网络可及性数据。但是，现有的研究通常依赖于手动设计的规则（即使用审查指纹）从数据中检测基于网络的Internet审查。尽管这种基于规则的方法产生了很高的真实积极检测率，但它面临着几个挑战：它需要人类的专业知识，费力，并且无法检测到规则未捕获的任何审查制度。为了克服这些挑战，我们设计和评估了基于潜在特征表示学习和基于图像的分类模型的分类模型，以检测基于网络的Internet审查制度。为了从网络可及性数据中推断潜在的特征表示，我们提出了一个序列到序列自动编码器，以捕获数据中数据元素的结构和顺序。为了估算从推断的潜在特征估算审查事件的概率，我们依赖于密集连接的多层神经网络模型。我们基于图像的分类模型将网络可及数据记录编码为灰度图像，并将图像分类为审查或不使用密集的卷积神经网络。我们使用持有评估使用审查行星的数据集比较和评估两种方法。两种分类模型均能够检测基于网络的Internet审查制度，因为我们能够确定已知指纹未检测到的审查实例。潜在特征表示可能会编码数据中更多的细微差别，因为潜在特征学习方法发现了新的审查实例的数量和更多样化的集合。

Internet censorship is a phenomenon of societal importance and attracts investigation from multiple disciplines. Several research groups, such as Censored Planet, have deployed large scale Internet measurement platforms to collect network reachability data. However, existing studies generally rely on manually designed rules (i.e., using censorship fingerprints) to detect network-based Internet censorship from the data. While this rule-based approach yields a high true positive detection rate, it suffers from several challenges: it requires human expertise, is laborious, and cannot detect any censorship not captured by the rules. Seeking to overcome these challenges, we design and evaluate a classification model based on latent feature representation learning and an image-based classification model to detect network-based Internet censorship. To infer latent feature representations fromnetwork reachability data, we propose a sequence-to-sequence autoencoder to capture the structure and the order of data elements in the data. To estimate the probability of censorship events from the inferred latent features, we rely on a densely connected multi-layer neural network model. Our image-based classification model encodes a network reachability data record as a gray-scale image and classifies the image as censored or not using a dense convolutional neural network. We compare and evaluate both approaches using data sets from Censored Planet via a hold-out evaluation. Both classification models are capable of detecting network-based Internet censorship as we were able to identify instances of censorship not detected by the known fingerprints. Latent feature representations likely encode more nuances in the data since the latent feature learning approach discovers a greater quantity, and a more diverse set, of new censorship instances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题