构图场景表示通过重建学习：调查

论文标题

构图场景表示通过重建学习：调查

Compositional Scene Representation Learning via Reconstruction: A Survey

论文作者

Yuan, Jinyang, Chen, Tonglin, Li, Bin, Xue, Xiangyang

论文摘要

视觉场景由视觉概念组成，并具有组合爆炸的特性。人类有效地从各种视觉场景中学习的重要原因是构图感知的能力，并且希望人工智能具有类似的能力。组成场景表示学习是一项能够实现此类能力的任务。近年来，已经提出了各种方法应用深度神经网络，这些神经网络已被证明在表示学习方面是有利的，可以通过重建来学习构图场景表示，并将这一研究方向推进了深度学习时代。通过重建学习是有利的，因为它可能会利用大量的未标记数据，并避免昂贵且费力的数据注释。在这项调查中，我们首先概述了具有深层神经网络的基于重建的构图场景表示学习的当前进展，包括从视觉场景的建模和场景表示的推理的角度来看，包括开发历史和现有方法的分类；然后提供基准，包括一个开源工具箱来重现基准实验的代表性方法，这些方法考虑了最广泛研究的问题设置并为其他方法构成基础；最后讨论了本研究主题现有方法和未来方向的局限性。

Visual scenes are composed of visual concepts and have the property of combinatorial explosion. An important reason for humans to efficiently learn from diverse visual scenes is the ability of compositional perception, and it is desirable for artificial intelligence to have similar abilities. Compositional scene representation learning is a task that enables such abilities. In recent years, various methods have been proposed to apply deep neural networks, which have been proven to be advantageous in representation learning, to learn compositional scene representations via reconstruction, advancing this research direction into the deep learning era. Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation. In this survey, we first outline the current progress on reconstruction-based compositional scene representation learning with deep neural networks, including development history and categorizations of existing methods from the perspectives of the modeling of visual scenes and the inference of scene representations; then provide benchmarks, including an open source toolbox to reproduce the benchmark experiments, of representative methods that consider the most extensively studied problem setting and form the foundation for other methods; and finally discuss the limitations of existing methods and future directions of this research topic.

下载PDF全文

下载文献需遵守相关版权规定

论文标题