论文标题
对基于DBLSTM-CTC的手写识别的隐性和明确语言模型信息的影响的研究
A Study on Effects of Implicit and Explicit Language Model Information for DBLSTM-CTC Based Handwriting Recognition
论文作者
论文摘要
具有连接的时间分类(CTC)输出层的深双向长期记忆(D-BLSTM)已被确定为用于手写识别的最新解决方案之一。众所周知,通过使用CTC目标函数训练的DBLSTM将学习字符建模的局部字符图像依赖性,也可以学习隐式语言建模的远程依赖性。在本文中,我们通过比较使用或不使用明确的语言模型在解码时的性能来研究隐式和显式语言模型信息对基于DBLSTM-CTC的手写识别的效果。据观察,即使使用一百万行培训句子来培训DBLSTM,使用明确的语言模型仍然有帮助。为了解决这样的大规模培训问题,通过使用基于小批量的ePochwise背部传播(BPTT)算法,已经开发了一种基于GPU的培训工具,用于DBLSTM的CTC培训。
Deep Bidirectional Long Short-Term Memory (D-BLSTM) with a Connectionist Temporal Classification (CTC) output layer has been established as one of the state-of-the-art solutions for handwriting recognition. It is well known that the DBLSTM trained by using a CTC objective function will learn both local character image dependency for character modeling and long-range contextual dependency for implicit language modeling. In this paper, we study the effects of implicit and explicit language model information for DBLSTM-CTC based handwriting recognition by comparing the performance of using or without using an explicit language model in decoding. It is observed that even using one million lines of training sentences to train the DBLSTM, using an explicit language model is still helpful. To deal with such a large-scale training problem, a GPU-based training tool has been developed for CTC training of DBLSTM by using a mini-batch based epochwise Back Propagation Through Time (BPTT) algorithm.