论文标题
隐私保存视觉问题回答
Privacy Preserving Visual Question Answering
论文作者
论文摘要
我们介绍了一种新颖的保护隐私方法,用于在边缘进行视觉问题回答。我们的方法使用低复杂的计算机视觉模型构建了视觉场景的符号表示,该模型共同预测了类,属性和谓词。此符号表示不可差异,这意味着它不能用于恢复原始图像,从而使原始图像保持私有。我们提出的混合解决方案使用的视觉模型比当前的最新视觉模型小25倍以上,而端到端SOTA VQA模型小的100倍。我们报告详细的错误分析,并讨论使用蒸馏视觉模型和视觉场景的象征性表示的权衡。
We introduce a novel privacy-preserving methodology for performing Visual Question Answering on the edge. Our method constructs a symbolic representation of the visual scene, using a low-complexity computer vision model that jointly predicts classes, attributes and predicates. This symbolic representation is non-differentiable, which means it cannot be used to recover the original image, thereby keeping the original image private. Our proposed hybrid solution uses a vision model which is more than 25 times smaller than the current state-of-the-art (SOTA) vision models, and 100 times smaller than end-to-end SOTA VQA models. We report detailed error analysis and discuss the trade-offs of using a distilled vision model and a symbolic representation of the visual scene.