论文标题
Fairlex:用于评估法律文本处理中公平性的多语言基准
FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing
论文作者
论文摘要
我们提出了四个数据集的基准套件,以评估预训练的语言模型的公平性以及用于微调下游任务的技术。我们的基准涵盖了四个司法管辖区(欧洲理事会,瑞士和中国),五种语言(英语,德语,法语,意大利语和中文)以及五个属性(性别,年龄,地区,语言和法律领域)的公平性。在我们的实验中,我们使用几种群体的微调技术评估了预训练的语言模型,并表明在许多情况下,绩效群体差异是充满活力的,而这些技术都没有保证公平性,也不始终如一地减轻群体偏见。此外,我们对我们的结果提供了定量和定性的分析,强调了法律NLP中鲁棒性方法发展的公开挑战。
We present a benchmark suite of four datasets for evaluating the fairness of pre-trained language models and the techniques used to fine-tune them for downstream tasks. Our benchmarks cover four jurisdictions (European Council, USA, Switzerland, and China), five languages (English, German, French, Italian and Chinese) and fairness across five attributes (gender, age, region, language, and legal area). In our experiments, we evaluate pre-trained language models using several group-robust fine-tuning techniques and show that performance group disparities are vibrant in many cases, while none of these techniques guarantee fairness, nor consistently mitigate group disparities. Furthermore, we provide a quantitative and qualitative analysis of our results, highlighting open challenges in the development of robustness methods in legal NLP.