论文标题
日语测试中的描述性答案的手写识别和自动评分
Handwriting recognition and automatic scoring for descriptive answers in Japanese language tests
论文作者
论文摘要
本文提出了一个在新的日本大学入学考试的试验测试中自动评分手写的描述答案的实验,该考试在2017年和2018年进行了约120,000个考生。大约有40万个字符的答案,其中有超过2000万个字符。尽管所有答案都是由人类考官评分的,但手写字符尚未标记。我们提出了将基于标记的手写数据集培训的基于神经网络的深层手写识别器调整为无标记的答案集的尝试。我们提出的方法结合了不同的培训策略,合奏多个认可者,并使用了一个由大型通用语料库构建的语言模型来避免过度拟合特定数据。在我们的实验中,提出的方法记录了超过97%的字符精度,使用了大约2,000个经过验证的标记答案,该答案占数据集的0.5%。然后,公认的答案将基于BERT模型的预先训练的自动评分系统,而无需纠正错误的字符并提供列音注释。自动评分系统从0.84到0.98的二次加权Kappa(QWK)。由于QWK超过0.8,它代表了自动评分系统和人类检查人员之间评分的可接受相似性。这些结果是有希望的,可以进一步研究描述性答案的端到端自动评分。
This paper presents an experiment of automatically scoring handwritten descriptive answers in the trial tests for the new Japanese university entrance examination, which were made for about 120,000 examinees in 2017 and 2018. There are about 400,000 answers with more than 20 million characters. Although all answers have been scored by human examiners, handwritten characters are not labeled. We present our attempt to adapt deep neural network-based handwriting recognizers trained on a labeled handwriting dataset into this unlabeled answer set. Our proposed method combines different training strategies, ensembles multiple recognizers, and uses a language model built from a large general corpus to avoid overfitting into specific data. In our experiment, the proposed method records character accuracy of over 97% using about 2,000 verified labeled answers that account for less than 0.5% of the dataset. Then, the recognized answers are fed into a pre-trained automatic scoring system based on the BERT model without correcting misrecognized characters and providing rubric annotations. The automatic scoring system achieves from 0.84 to 0.98 of Quadratic Weighted Kappa (QWK). As QWK is over 0.8, it represents an acceptable similarity of scoring between the automatic scoring system and the human examiners. These results are promising for further research on end-to-end automatic scoring of descriptive answers.