论文标题
基因组数据库的强大指纹
Robust Fingerprinting of Genomic Databases
论文作者
论文摘要
数据库指纹已被广泛用于通过提供数据泄漏来源的方式来阻止数据重新分布。但是,在共享基因组数据库时,没有旨在实现责任保证的指纹识别计划。因此,我们通过设计专门针对基因组数据库的香草指纹方案来填补这一空白。此外,由于恶意基因组数据库收件人可能通过发射有效的相关攻击来损害嵌入式指纹攻击,从而利用基因组数据之间的固有相关性(例如,Mendel的定律和链接不平衡),我们还通过开发Vanilla来攻击强劲的基因构成基因分析,我们还通过增强Vanilla的攻击。 我们首先表明,针对基因组数据库的指纹方案的相关攻击非常强大。特别地,相关攻击会通过导致小实用性损失(例如,通过P值测量的SNP-光型关联的数据库精度和一致性)来扭曲一半以上的指纹位。接下来,我们通过实验表明,我们提出的缓解技术可以有效地减轻相关攻击。我们验证攻击者几乎不会损害大部分指纹位,即使它在数据库实用程序的退化方面支付的成本更高。例如,由于SNP - 表型关联的一致性的准确性损失约为24%,攻击者只能扭曲约30%的指纹位,这不足以避免被指控。我们还表明,提出的缓解技术还保留了共享基因组数据库的效用。
Database fingerprinting has been widely used to discourage unauthorized redistribution of data by providing means to identify the source of data leakages. However, there is no fingerprinting scheme aiming at achieving liability guarantees when sharing genomic databases. Thus, we are motivated to fill in this gap by devising a vanilla fingerprinting scheme specifically for genomic databases. Moreover, since malicious genomic database recipients may compromise the embedded fingerprint by launching effective correlation attacks which leverage the intrinsic correlations among genomic data (e.g., Mendel's law and linkage disequilibrium), we also augment the vanilla scheme by developing mitigation techniques to achieve robust fingerprinting of genomic databases against correlation attacks. We first show that correlation attacks against fingerprinting schemes for genomic databases are very powerful. In particular, the correlation attacks can distort more than half of the fingerprint bits by causing a small utility loss (e.g.,database accuracy and consistency of SNP-phenotype associations measured via p-values). Next, we experimentally show that the correlation attacks can be effectively mitigated by our proposed mitigation techniques. We validate that the attacker can hardly compromise a large portion of the fingerprint bits even if it pays a higher cost in terms of degradation of the database utility. For example, with around 24% loss in accuracy and 20% loss in the consistency of SNP-phenotype associations, the attacker can only distort about 30% fingerprint bits, which is insufficient for it to avoid being accused. We also show that the proposed mitigation techniques also preserve the utility of the shared genomic databases.