论文标题
部署情绪分析模型的系统攻击表面降低
Systematic Attack Surface Reduction For Deployed Sentiment Analysis Models
论文作者
论文摘要
这项工作提出了一种结构化方法,用于基础模型,识别攻击向量并在部署后确保机器学习模型。这种确保每个模型部署后部署的方法称为“不良(构建,攻击和捍卫)体系结构”。评估了不良体系结构的两个实现,以量化黑框情绪分析系统的对抗生命周期。作为一个充满挑战的诊断,拼图有毒偏置数据集被选为我们的性能工具中的基线。体系结构的每个实现都将构建基线绩效报告,攻击共同的弱点并捍卫传入的攻击。重要的是:这项工作中所示的每个攻击表面都是可检测和预防的。目的是展示一种在生产环境中确保机器学习模型的可行方法。
This work proposes a structured approach to baselining a model, identifying attack vectors, and securing the machine learning models after deployment. This method for securing each model post deployment is called the BAD (Build, Attack, and Defend) Architecture. Two implementations of the BAD architecture are evaluated to quantify the adversarial life cycle for a black box Sentiment Analysis system. As a challenging diagnostic, the Jigsaw Toxic Bias dataset is selected as the baseline in our performance tool. Each implementation of the architecture will build a baseline performance report, attack a common weakness, and defend the incoming attack. As an important note: each attack surface demonstrated in this work is detectable and preventable. The goal is to demonstrate a viable methodology for securing a machine learning model in a production setting.