暴力对象：人权调查中实用ML的合成数据

论文标题

暴力对象：人权调查中实用ML的合成数据

Objects of violence: synthetic data for practical ML in human rights investigations

论文作者

Kermode, Lachlan, Freyberg, Jan, Akturk, Alican, Trafford, Robert, Kochetkov, Denis, Pardinas, Rafael, Weizman, Eyal, Cornebise, Julien

论文摘要

我们介绍了机器学习工作流程，以搜索，识别和有意义的分类视频以及弹药，武器和军事设备的图像，即使存在有限的培训数据以引起人们的关注。该工作流程旨在加快OSINT（“开源情报”）研究人员在人权调查中的工作。它由三个组成部分组成：合成数据集的自动渲染和注释，这些数据集弥补了缺乏培训数据；培训图像分类器，从合并的摄影和合成数据组中；和Mtriage是一种开源软件，该软件旨在协调这些分类器的部署到分类公共领域媒体，并可视化Web界面中的预测。我们表明，合成数据有助于更有效地培训分类器，并且某些方法为不同的架构带来了更好的结果。然后，我们在两项现实世界人权调查中证明了我们的工作流程：对平民的三重chaser催泪瓦斯手榴弹的使用以及2014年在乌克兰对军事存在指控的验证。

We introduce a machine learning workflow to search for, identify, and meaningfully triage videos and images of munitions, weapons, and military equipment, even when limited training data exists for the object of interest. This workflow is designed to expedite the work of OSINT ("open source intelligence") researchers in human rights investigations. It consists of three components: automatic rendering and annotating of synthetic datasets that make up for a lack of training data; training image classifiers from combined sets of photographic and synthetic data; and mtriage, an open source software that orchestrates these classifiers' deployment to triage public domain media, and visualise predictions in a web interface. We show that synthetic data helps to train classifiers more effectively, and that certain approaches yield better results for different architectures. We then demonstrate our workflow in two real-world human rights investigations: the use of the Triple-Chaser tear gas grenade against civilians, and the verification of allegations of military presence in Ukraine in 2014.

下载PDF全文

下载文献需遵守相关版权规定

论文标题