论文标题
评估大量多语言情感分类器
Assessment of Massively Multilingual Sentiment Classifiers
论文作者
论文摘要
在寻找SOTA时,模型的大小和复杂性正在增加。但是,如果这些2 \%的性能提高对生产用例没有影响怎么办?也许从较小,更快的模型中受益于那些略有性能的增长。同样,在多语言任务中,跨语言的表现同样重要,比单个语言的SOTA结果更为重要。我们提出了最大的,统一的,多语言分析数据集的集合。我们使用这些用27种语言评估11个模型和80个高质量情感数据集(在收集的342个原始数据集中),并在内部注释的数据集中包括结果。我们深入评估了多个设置,包括基于微调变压器的模型来衡量性能。我们比较了许多维度的结果,以解决两种语言覆盖范围和数据集大小的不平衡。最后,我们提供了一些最佳实践,用于从多语言角度使用如此庞大的数据集和模型集合。
Models are increasing in size and complexity in the hunt for SOTA. But what if those 2\% increase in performance does not make a difference in a production use case? Maybe benefits from a smaller, faster model outweigh those slight performance gains. Also, equally good performance across languages in multilingual tasks is more important than SOTA results on a single one. We present the biggest, unified, multilingual collection of sentiment analysis datasets. We use these to assess 11 models and 80 high-quality sentiment datasets (out of 342 raw datasets collected) in 27 languages and included results on the internally annotated datasets. We deeply evaluate multiple setups, including fine-tuning transformer-based models for measuring performance. We compare results in numerous dimensions addressing the imbalance in both languages coverage and dataset sizes. Finally, we present some best practices for working with such a massive collection of datasets and models from a multilingual perspective.