论文标题

通过平行电影字幕构建希伯来语语义角色标记词汇资源

Building a Hebrew Semantic Role Labeling Lexical Resource from Parallel Movie Subtitles

论文作者

Eyal, Ben, Elhadad, Michael

论文摘要

我们通过英语的注释投影,为希伯来语构建的希伯来语标记资源提供了语义角色标签。该语料库源自多语言opensubtitles数据集,其中包括简短的非正式句子,为此计算了可靠的语言注释。我们提供数据的完全注释版本,包括形态学分析,依赖性语法和语义角色标记,并在Framenet和Propbank样式中。句子在英语和希伯来语之间保持一致,双方都包括完整的注释以及从英语论点到希伯来语的明确映射。我们在此希伯来语资源上训练神经SRL模型,利用预先训练的多语言BERT变压器模型,并为希伯来语SRL提供了第一个可用的基线模型作为参考点。我们提供的代码是通用的,可以适应其他语言来引导SRL资源。

We present a semantic role labeling resource for Hebrew built semi-automatically through annotation projection from English. This corpus is derived from the multilingual OpenSubtitles dataset and includes short informal sentences, for which reliable linguistic annotations have been computed. We provide a fully annotated version of the data including morphological analysis, dependency syntax and semantic role labeling in both FrameNet and PropBank styles. Sentences are aligned between English and Hebrew, both sides include full annotations and the explicit mapping from the English arguments to the Hebrew ones. We train a neural SRL model on this Hebrew resource exploiting the pre-trained multilingual BERT transformer model, and provide the first available baseline model for Hebrew SRL as a reference point. The code we provide is generic and can be adapted to other languages to bootstrap SRL resources.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源