论文标题
差异隐私和自然语言处理,以在蜂蜜加密方案中生成上下文类似的诱饵消息
Differential Privacy and Natural Language Processing to Generate Contextually Similar Decoy Messages in Honey Encryption Scheme
论文作者
论文摘要
蜂蜜加密是一种使用低的最小渗透键加密消息的方法,例如弱密码,OTP,引脚,信用卡号。当用任何数量的不正确键解密时,会产生密文,从而产生合理的外观但虚假的宣传,称为“蜂蜜消息”。但是,用于生产诱饵明文的当前技术并不能完全建模人类语言。散发出的言语不足以欺骗攻击者。无论攻击者是否知道真实来源的一些信息,这都是可以接受和令人信服的。 在本文中,我专注于一些非数字信息信息的明文。为了欺骗攻击者认为诱饵信息实际上可以来自某个来源,我们需要捕获语言的经验和上下文属性。也就是说,在不揭示真实消息的结构的情况下,实际和假消息之间不应存在语言差异。我采用自然语言处理和广义差异隐私来解决此问题。主要是我专注于机器学习方法,例如关键字提取,上下文分类,词袋,单词嵌入式,用于文本处理的变压器,以模拟文本文档的隐私。然后,我通过电子差异隐私证明了这种方法的安全性。
Honey Encryption is an approach to encrypt the messages using low min-entropy keys, such as weak passwords, OTPs, PINs, credit card numbers. The ciphertext is produces, when decrypted with any number of incorrect keys, produces plausible-looking but bogus plaintext called "honey messages". But the current techniques used in producing the decoy plaintexts do not model human language entirely. A gibberish, random assortment of words is not enough to fool an attacker; that will not be acceptable and convincing, whether or not the attacker knows some information of the genuine source. In this paper, I focus on the plaintexts which are some non-numeric informative messages. In order to fool the attacker into believing that the decoy message can actually be from a certain source, we need to capture the empirical and contextual properties of the language. That is, there should be no linguistic difference between real and fake message, without revealing the structure of the real message. I employ natural language processing and generalized differential privacy to solve this problem. Mainly I focus on machine learning methods like keyword extraction, context classification, bags-of-words, word embeddings, transformers for text processing to model privacy for text documents. Then I prove the security of this approach with e-differential privacy.