论文标题
当中文符合机器学习时:解释单词和句子分割任务的相对性能
When Classical Chinese Meets Machine Learning: Explaining the Relative Performances of Word and Sentence Segmentation Tasks
论文作者
论文摘要
我们在实验中考虑了有关中国唐王朝的三个主要文本来源,这些文本旨在细分以古典中文编写的文本。这些语料库包括唐墓的集合,新的唐书和旧唐书。我们表明,有可能通过深度学习方法获得令人满意的细分结果。更有趣的是,我们发现我们在不同的实验设计中观察到的一些相对优势可以解释。培训语料库之间的相对相关性为观察到的细分结果差异提供了提示/解释,这些差异是我们采用不同的Corpora组合来培训分类器时所取得的。
We consider three major text sources about the Tang Dynasty of China in our experiments that aim to segment text written in classical Chinese. These corpora include a collection of Tang Tomb Biographies, the New Tang Book, and the Old Tang Book. We show that it is possible to achieve satisfactory segmentation results with the deep learning approach. More interestingly, we found that some of the relative superiority that we observed among different designs of experiments may be explainable. The relative relevance among the training corpora provides hints/explanation for the observed differences in segmentation results that were achieved when we employed different combinations of corpora to train the classifiers.