论文标题

主动学习视频描述与集群登记合奏排名

Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

论文作者

Chan, David M., Vijayanarasimhan, Sudheendra, Ross, David A., Canny, John

论文摘要

自动视频字幕旨在培训模型以生成视频中所有细分市场的文本描述,但是,最有效的方法需要大量的手动注释,这是缓慢而昂贵的。主动学习是一种有前途的方法,可以有效地为视频字幕任务构建培训集,同时减少手动标记非信息示例的需求。在这项工作中,我们俩都探索了自动视频字幕的各种主动学习方法,并表明由群集进行了限制的合奏策略为有效地收集视频字幕的培训集提供了最佳的积极学习方法。我们使用基于变压器和基于LSTM的字幕模型评估了MSR-VTT和LSMDC数据集上的方法,并表明我们的新型策略可以实现高性能,同时比强大的训练数据少了60%。

Automatic video captioning aims to train models to generate text descriptions for all segments in a video, however, the most effective approaches require large amounts of manual annotation which is slow and expensive. Active learning is a promising way to efficiently build a training set for video captioning tasks while reducing the need to manually label uninformative examples. In this work we both explore various active learning approaches for automatic video captioning and show that a cluster-regularized ensemble strategy provides the best active learning approach to efficiently gather training sets for video captioning. We evaluate our approaches on the MSR-VTT and LSMDC datasets using both transformer and LSTM based captioning models and show that our novel strategy can achieve high performance while using up to 60% fewer training data than the strong state of the art baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源