视频场景理解的概率未来预测

论文标题

视频场景理解的概率未来预测

Probabilistic Future Prediction for Video Scene Understanding

论文作者

Hu, Anthony, Cotter, Fergal, Mohan, Nikhil, Gurau, Corina, Kendall, Alex

论文摘要

我们为视频中的概率未来预测提供了一种新颖的深度学习体系结构。我们预测了复杂的现实世界城市场景的未来语义，几何形状和运动，并使用此表示形式控制自动驾驶汽车。这项工作是第一个以概率方式共同预测自我运动，静态场景和动态剂的运动的工作，从而可以从紧凑的潜在空间中抽样一致的，高度可能的未来。我们的模型从带有时空卷积模块的RGB视频中学习了一个表示形式。除了成为学习驾驶策略的输入之外，还可以将学习的表示形式明确解码为未来的语义细分，深度和光流。为了模拟未来的随机性，我们引入了一种条件变异方法，该方法可以最大程度地减少当前分布之间的差异（鉴于我们所看到的）和未来的分布（我们观察到的实际发生了）。在推断期间，通过当前分布的采样产生了不同的期货。

We present a novel deep learning architecture for probabilistic future prediction from video. We predict the future semantics, geometry and motion of complex real-world urban scenes and use this representation to control an autonomous vehicle. This work is the first to jointly predict ego-motion, static scene, and the motion of dynamic agents in a probabilistic manner, which allows sampling consistent, highly probable futures from a compact latent space. Our model learns a representation from RGB video with a spatio-temporal convolutional module. The learned representation can be explicitly decoded to future semantic segmentation, depth, and optical flow, in addition to being an input to a learnt driving policy. To model the stochasticity of the future, we introduce a conditional variational approach which minimises the divergence between the present distribution (what could happen given what we have seen) and the future distribution (what we observe actually happens). During inference, diverse futures are generated by sampling from the present distribution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题