论文标题

俄罗斯Superglue 1.1:修改俄罗斯NLP模型未学到的课程

Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models

论文作者

Fenogenova, Alena, Tikhonova, Maria, Mikhailov, Vladislav, Shavrina, Tatiana, Emelyanov, Anton, Shevelev, Denis, Kukushkin, Alexandr, Malykh, Valentin, Artemova, Ekaterina

论文摘要

去年,俄罗斯已发布了新的神经体系结构和多语言预培训模型,这导致了一系列语言理解任务的性能评估问题。 本文介绍了俄罗斯Superglue 1.1,这是俄罗斯NLP型号胶水以胶水为基准的更新基准。新版本包括许多技术,用户体验和方法学改进,包括在上一版本中尚未解决的基准漏洞的修复:新颖和改进的测试,以理解上下文中单词的含义(RUSSE)以及阅读理解和常识推理(Danetqa,Rucos,Muserc,Muserc)。加上更新的数据集的发布,我们基于\ texttt {jiant}框架改进了基准工具包,用于一致培训和评估各种体系结构的NLP模型,这些NLP模型现在支持俄罗斯的最新模型。最后,我们提供了俄罗斯超级努力的整合与开源模型摩洛哥(模型资源比较)的工业评估框架,其中根据所有任务,推理速度和占用的RAM的加权平均度量进行评估。俄罗斯Superglue可在https://russiansuperglue.com/上公开获取。

In the last year, new neural architectures and multilingual pre-trained models have been released for Russian, which led to performance evaluation problems across a range of language understanding tasks. This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after GLUE for Russian NLP models. The new version includes a number of technical, user experience and methodological improvements, including fixes of the benchmark vulnerabilities unresolved in the previous version: novel and improved tests for understanding the meaning of a word in context (RUSSE) along with reading comprehension and common sense reasoning (DaNetQA, RuCoS, MuSeRC). Together with the release of the updated datasets, we improve the benchmark toolkit based on \texttt{jiant} framework for consistent training and evaluation of NLP-models of various architectures which now supports the most recent models for Russian. Finally, we provide the integration of Russian SuperGLUE with a framework for industrial evaluation of the open-source models, MOROCCO (MOdel ResOurCe COmparison), in which the models are evaluated according to the weighted average metric over all tasks, the inference speed, and the occupied amount of RAM. Russian SuperGLUE is publicly available at https://russiansuperglue.com/.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源