在视频中停止文本识别的下一个组合结果的快速近似建模

论文标题

在视频中停止文本识别的下一个组合结果的快速近似建模

Fast Approximate Modelling of the Next Combination Result for Stopping the Text Recognition in a Video

论文作者

Bulatov, Konstantin, Fedotova, Nadezhda, Arlazarov, Vladimir V.

论文摘要

在本文中，我们考虑了停止文本字段的视频流识别过程的任务，在该过程中，每个帧都独立识别，并且将各个结果组合在一起。视频流识别停止问题是关于计算机视觉的研究不足的话题，但与构建高性能视频识别系统的相关性很明显。首先，我们描述了一种基于下一个组合结果的建模来最佳停止这种过程的现有方法。然后，我们描述了近似值和假设，使我们能够构建优化的计算方案，从而获得具有降低计算复杂性的方法。评估了这些方法的文档文本字段识别和视频中任意文本识别的任务。实验比较表明，引入的近似值不会根据所达到的组合结果精度降低停止方法的质量，同时大大减少了做出停止决策所需的时间。两个文本识别任务的结果一致。

In this paper, we consider a task of stopping the video stream recognition process of a text field, in which each frame is recognized independently and the individual results are combined together. The video stream recognition stopping problem is an under-researched topic with regards to computer vision, but its relevance for building high-performance video recognition systems is clear. Firstly, we describe an existing method of optimally stopping such a process based on a modelling of the next combined result. Then, we describe approximations and assumptions which allowed us to build an optimized computation scheme and thus obtain a method with reduced computational complexity. The methods were evaluated for the tasks of document text field recognition and arbitrary text recognition in a video. The experimental comparison shows that the introduced approximations do not diminish the quality of the stopping method in terms of the achieved combined result precision, while dramatically reducing the time required to make the stopping decision. The results were consistent for both text recognition tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题