论文标题
整理深层显着性预测体系结构
Tidying Deep Saliency Prediction Architectures
论文作者
论文摘要
学习视觉关注的计算模型(显着性估计)是一种努力,使机器/机器人更接近人类的视觉认知能力。自从引入深神经网络体系结构以来,数据驱动的工作一直占据了景观。在深度学习研究中,建筑设计中的选择通常是经验的,并且经常导致比必要的更复杂的模型。复杂性反过来妨碍了申请要求。在本文中,我们确定了显着模型的四个关键组成部分,即输入功能,多级集成,读取体系结构和损失功能。我们在这四个组件上回顾了现有的艺术模型,并提出了新颖,更简单的替代方案。结果,我们提出了两个新颖的端到端架构,称为SimpleNet和Mdnsal,它们是整洁,最小,更容易解释的,并且在公共显着基准上实现了最先进的表现。 SimpleNet是一种优化的编码器架构,在Salicon数据集(最大的显着性基准)上带来了显着的性能增长。 MDNSAL是一个参数模型,可直接预测GMM分布的参数,并旨在为预测图带来更多的解释性。提出的显着性模型可以在25fps下推断,使其适合实时应用。代码和预训练的模型可在https://github.com/samyak0210/sality上找到。
Learning computational models for visual attention (saliency estimation) is an effort to inch machines/robots closer to human visual cognitive abilities. Data-driven efforts have dominated the landscape since the introduction of deep neural network architectures. In deep learning research, the choices in architecture design are often empirical and frequently lead to more complex models than necessary. The complexity, in turn, hinders the application requirements. In this paper, we identify four key components of saliency models, i.e., input features, multi-level integration, readout architecture, and loss functions. We review the existing state of the art models on these four components and propose novel and simpler alternatives. As a result, we propose two novel end-to-end architectures called SimpleNet and MDNSal, which are neater, minimal, more interpretable and achieve state of the art performance on public saliency benchmarks. SimpleNet is an optimized encoder-decoder architecture and brings notable performance gains on the SALICON dataset (the largest saliency benchmark). MDNSal is a parametric model that directly predicts parameters of a GMM distribution and is aimed to bring more interpretability to the prediction maps. The proposed saliency models can be inferred at 25fps, making them suitable for real-time applications. Code and pre-trained models are available at https://github.com/samyak0210/saliency.