论文标题
如何用低资源语言探测句子嵌入:在探测任务评估的结构设计选择上
How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation
论文作者
论文摘要
句子编码将句子的句子句子与真实有价值的向量用于下游应用程序。为了窥视这些表示形式(例如,为了提高其结果的解释性),已经设计了探测任务,以查询它们以获取语言知识。但是,设计较少资源语言的探测任务是棘手的,因为这些语言通常缺乏大规模的注释数据或(高质量)依赖性解析器,这是用英语探索任务设计的先决条件。为了调查如何在这种情况下探测句子嵌入,我们研究了探测任务结果对结构设计选择的敏感性,从而进行了第一个大规模研究。我们表明,设计选择,例如带注释的探测数据集的大小和用于评估的分类器的类型(有时实质上)会影响探测结果。然后,我们将嵌入在多语言设置中,并在“稳定区域”中使用设计选择,因为我们可以用英语识别,并发现英语的结果不会转移到其他语言中。因此,将来应该对多种语言进行更公平,更全面的句子级探测评估。
Sentence encoders map sentences to real valued vectors for use in downstream applications. To peek into these representations - e.g., to increase interpretability of their results - probing tasks have been designed which query them for linguistic knowledge. However, designing probing tasks for lesser-resourced languages is tricky, because these often lack large-scale annotated data or (high-quality) dependency parsers as a prerequisite of probing task design in English. To investigate how to probe sentence embeddings in such cases, we investigate sensitivity of probing task results to structural design choices, conducting the first such large scale study. We show that design choices like size of the annotated probing dataset and type of classifier used for evaluation do (sometimes substantially) influence probing outcomes. We then probe embeddings in a multilingual setup with design choices that lie in a 'stable region', as we identify for English, and find that results on English do not transfer to other languages. Fairer and more comprehensive sentence-level probing evaluation should thus be carried out on multiple languages in the future.