论文标题
骆驼:案例标记提取没有标签
CaMEL: Case Marker Extraction without Labels
论文作者
论文摘要
我们介绍骆驼(没有标签的情况标记提取),这是计算形态中新颖而充满挑战的任务,与低资源语言特别相关。我们为骆驼提供了第一个模型,该模型使用大量多语种语料库以83种语言提取案例标记,仅基于名词短语块和一个对齐系统。为了评估骆驼,我们将自动从Unimorph构建银标准。我们的模型提取的案例标记可用于检测和可视化不同语言的案例系统之间的相似性和差异,并在未公开标记的语言中注释细粒度的深度案例。
We introduce CaMEL (Case Marker Extraction without Labels), a novel and challenging task in computational morphology that is especially relevant for low-resource languages. We propose a first model for CaMEL that uses a massively multilingual corpus to extract case markers in 83 languages based only on a noun phrase chunker and an alignment system. To evaluate CaMEL, we automatically construct a silver standard from UniMorph. The case markers extracted by our model can be used to detect and visualise similarities and differences between the case systems of different languages as well as to annotate fine-grained deep cases in languages in which they are not overtly marked.