论文标题
基于机器学习的系统中的威胁评估
Threat Assessment in Machine Learning based Systems
论文作者
论文摘要
机器学习是一个人工智能(AI)的领域,它对几个关键系统至关重要,使其成为威胁参与者的良好目标。威胁参与者利用不同的策略,技术和程序(TTP),以防止机器学习(ML)系统的机密性,完整性和可用性。在ML周期期间,他们将对抗性TTP利用为毒数据和基于ML ML的系统。近年来,已经为传统系统提出了多种安全惯例,但它们不足以应对基于ML的系统的性质。在本文中,我们对针对基于ML的系统的威胁进行了实证研究,旨在理解和表征ML威胁的性质并确定常见的缓解策略。该研究基于MITER的ATLAS数据库,AI事件数据库和文献的89个现实世界ML攻击方案。基于其声誉选择的GitHub搜索和Python包装咨询数据库的854毫升存储库。 AI事件数据库和文献的攻击用于识别Atlas中未记录的漏洞和新类型的威胁。结果表明,卷积神经网络是攻击方案中最有针对性的模型之一。 ML漏洞最大的ML存储库包括TensorFlow,OpenCV和笔记本。在本文中,我们还报告了研究的ML存储库中最常见的漏洞,最有针对性的ML阶段和模型,是ML阶段和攻击方案中最常用的TTP。对于红色/蓝色团队,该信息尤其重要,以更好地进行攻击/防御,从业人员在ML开发过程中预防威胁以及研究人员开发有效的防御机制。
Machine learning is a field of artificial intelligence (AI) that is becoming essential for several critical systems, making it a good target for threat actors. Threat actors exploit different Tactics, Techniques, and Procedures (TTPs) against the confidentiality, integrity, and availability of Machine Learning (ML) systems. During the ML cycle, they exploit adversarial TTPs to poison data and fool ML-based systems. In recent years, multiple security practices have been proposed for traditional systems but they are not enough to cope with the nature of ML-based systems. In this paper, we conduct an empirical study of threats reported against ML-based systems with the aim to understand and characterize the nature of ML threats and identify common mitigation strategies. The study is based on 89 real-world ML attack scenarios from the MITRE's ATLAS database, the AI Incident Database, and the literature; 854 ML repositories from the GitHub search and the Python Packaging Advisory database, selected based on their reputation. Attacks from the AI Incident Database and the literature are used to identify vulnerabilities and new types of threats that were not documented in ATLAS. Results show that convolutional neural networks were one of the most targeted models among the attack scenarios. ML repositories with the largest vulnerability prominence include TensorFlow, OpenCV, and Notebook. In this paper, we also report the most frequent vulnerabilities in the studied ML repositories, the most targeted ML phases and models, the most used TTPs in ML phases and attack scenarios. This information is particularly important for red/blue teams to better conduct attacks/defenses, for practitioners to prevent threats during ML development, and for researchers to develop efficient defense mechanisms.