论文标题

超越文本:通过句法和语义角色标签对隐私声明的分析

Beyond The Text: Analysis of Privacy Statements through Syntactic and Semantic Role Labeling

论文作者

Shvartzshnaider, Yan, Balashankar, Ananth, Patidar, Vikas, Wies, Thomas, Subramanian, Lakshminarayanan

论文摘要

本文通过上下文完整性的镜头制定了从隐私政策中提取隐私参数的新任务,这是一个建立的社会理论框架,用于推理隐私规范。由律师撰写的隐私政策冗长,通常包括不完整和模糊的陈述。在本文中,我们表明传统的NLP任务,包括最近提出的基于问题的解决方案,不足以解决隐私参数提取问题并提供差的精度和回忆。我们描述了4种不同类型的常规方法,可以部分适应以不同程度的成功来解决参数提取任务:隐藏的马尔可夫模型,BERT微调模型,依赖关系类型解析(DP)和语义角色标签(SRL)。基于主要企业的36个真实世界隐私政策的详细评估,我们证明了结合语法DP与特定类型SRL任务的解决方案为从隐私语句中检索上下文隐私参数提供了最高准确性。我们还观察到,合并特定领域的知识对于实现高精度和回忆至关重要,从而激发了新的NLP研究以解决隐私领域中的这一重要问题。

This paper formulates a new task of extracting privacy parameters from a privacy policy, through the lens of Contextual Integrity, an established social theory framework for reasoning about privacy norms. Privacy policies, written by lawyers, are lengthy and often comprise incomplete and vague statements. In this paper, we show that traditional NLP tasks, including the recently proposed Question-Answering based solutions, are insufficient to address the privacy parameter extraction problem and provide poor precision and recall. We describe 4 different types of conventional methods that can be partially adapted to address the parameter extraction task with varying degrees of success: Hidden Markov Models, BERT fine-tuned models, Dependency Type Parsing (DP) and Semantic Role Labeling (SRL). Based on a detailed evaluation across 36 real-world privacy policies of major enterprises, we demonstrate that a solution combining syntactic DP coupled with type-specific SRL tasks provides the highest accuracy for retrieving contextual privacy parameters from privacy statements. We also observe that incorporating domain-specific knowledge is critical to achieving high precision and recall, thus inspiring new NLP research to address this important problem in the privacy domain.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源