论文标题
中国王朝历史:两千年以来的性别分析
Corpus of Chinese Dynastic Histories: Gender Analysis over Two Millennia
论文作者
论文摘要
从公元前3世纪到公元18世纪,中国王朝的历史构成了大约2000年的大约2000年。这些历史记录在经典(文学)中文中,具有超过2000万个字符的语料库,适用于历史词典和语义变化的计算分析。但是,这些历史没有免费提供的开源语料库,使古典中国的低资源。该项目介绍了由Creative Commons许可证涵盖的二十四个王朝历史的新开源语料库。开发了一份经典中国性别特定术语的原始列表,作为一个案例研究,用于分析男性和女性术语的历史语言使用。该研究表明,在这些术语的使用方面具有相当大的稳定性,并以男性术语为主。对单词含义的探索使用为性别特异性术语创建的关键词分析。该方法产生有意义的语义表示,可用于对历时语义的未来研究。
Chinese dynastic histories form a large continuous linguistic space of approximately 2000 years, from the 3rd century BCE to the 18th century CE. The histories are documented in Classical (Literary) Chinese in a corpus of over 20 million characters, suitable for the computational analysis of historical lexicon and semantic change. However, there is no freely available open-source corpus of these histories, making Classical Chinese low-resource. This project introduces a new open-source corpus of twenty-four dynastic histories covered by Creative Commons license. An original list of Classical Chinese gender-specific terms was developed as a case study for analyzing the historical linguistic use of male and female terms. The study demonstrates considerable stability in the usage of these terms, with dominance of male terms. Exploration of word meanings uses keyword analysis of focus corpora created for genderspecific terms. This method yields meaningful semantic representations that can be used for future studies of diachronic semantics.