基于预测编码的多尺度网络，带有编码器decoder LSTM进行视频预测

论文标题

基于预测编码的多尺度网络，带有编码器decoder LSTM进行视频预测

Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction

论文作者

Ling, Chaofan, Zhong, Junpei, Li, Weihua

论文摘要

我们为将来的视频帧预测提供了一个多尺度预测编码模型。 Drawing inspiration on the ``Predictive Coding" theories in cognitive science, it is updated by a combination of bottom-up and top-down information flows, which can enhance the interaction between different network levels. However, traditional predictive coding models only predict what is happening hierarchically rather than predicting the future. To address the problem, our model employs a multi-scale approach (Coarse to Fine), where the higher level neurons generate coarser predictions (lower分辨率），而较低的级别会产生更精细的预测（在网络体系结构方面，我们直接将编码器 - 模块网络纳入LSTM模块中，并共享最终编码的高级语义信息。依赖关系。

We present a multi-scale predictive coding model for future video frames prediction. Drawing inspiration on the ``Predictive Coding" theories in cognitive science, it is updated by a combination of bottom-up and top-down information flows, which can enhance the interaction between different network levels. However, traditional predictive coding models only predict what is happening hierarchically rather than predicting the future. To address the problem, our model employs a multi-scale approach (Coarse to Fine), where the higher level neurons generate coarser predictions (lower resolution), while the lower level generate finer predictions (higher resolution). In terms of network architecture, we directly incorporate the encoder-decoder network within the LSTM module and share the final encoded high-level semantic information across different network levels. This enables comprehensive interaction between the current input and the historical states of LSTM compared with the traditional Encoder-LSTM-Decoder architecture, thus learning more believable temporal and spatial dependencies. Furthermore, to tackle the instability in adversarial training and mitigate the accumulation of prediction errors in long-term prediction, we propose several improvements to the training strategy. Our approach achieves good performance on datasets such as KTH, Moving MNIST and Caltech Pedestrian. Code is available at https://github.com/Ling-CF/MSPN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题