论文标题
没有边界的形态:条款级形态学
Morphology Without Borders: Clause-Level Morphology
论文作者
论文摘要
形态学任务使用大型多语言数据集将单词组织成拐点,然后用作各种任务的培训和评估数据。但是,对这些数据的仔细检查揭示了深刻的跨语言不一致之处,这是由于缺乏对单词的语言和操作定义的缺乏,并且严重损害了派生任务的普遍性。为了克服这种缺陷,我们建议将形态视为条款级现象,而不是单词级别。它固定在固定但包容的一组功能中,该功能封装了饱和子句中实现的所有功能。我们提供了MightyMorph,这是一种用于条款级形态学的新型数据集,涵盖4种类型上不同的语言:英语,德语,土耳其语和希伯来语。我们使用此数据集来得出3个条款级的形态任务:拐点,重新构和分析。我们的实验表明,子句级任务比各自的单词级任务要难得多,同时具有相当的语言复杂性。此外,将形态重新定义为条款级别提供了与上下文化语言模型(LMS)的整洁界面,并允许评估这些模型中编码的形态学知识及其对形态任务的可用性。综上所述,这项工作在计算形态的研究中开辟了新的视野,留出了足够的空间来跨语言研究神经形态。
Morphological tasks use large multi-lingual datasets that organize words into inflection tables, which then serve as training and evaluation data for various tasks. However, a closer inspection of these data reveals profound cross-linguistic inconsistencies, that arise from the lack of a clear linguistic and operational definition of what is a word, and that severely impair the universality of the derived tasks. To overcome this deficiency, we propose to view morphology as a clause-level phenomenon, rather than word-level. It is anchored in a fixed yet inclusive set of features, that encapsulates all functions realized in a saturated clause. We deliver MightyMorph, a novel dataset for clause-level morphology covering 4 typologically-different languages: English, German, Turkish and Hebrew. We use this dataset to derive 3 clause-level morphological tasks: inflection, reinflection and analysis. Our experiments show that the clause-level tasks are substantially harder than the respective word-level tasks, while having comparable complexity across languages. Furthermore, redefining morphology to the clause-level provides a neat interface with contextualized language models (LMs) and allows assessing the morphological knowledge encoded in these models and their usability for morphological tasks. Taken together, this work opens up new horizons in the study of computational morphology, leaving ample space for studying neural morphology cross-linguistically.