论文标题
检查Hatecheck:对仇恨言论检测的行为意识学习的跨职能分析
Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection
论文作者
论文摘要
行为测试 - 通过验证人工设计的输入输出对来验证系统功能 - 是提议解决标准方法缺点的自然语言处理系统的另一种评估方法:计算持有数据的指标。尽管行为测试捕捉了人类的先验知识和见解,但关于如何利用它们进行模型培训和开发的探索很少。考虑到这一点,我们通过使用Hatecheck(HateCheck)来探讨行为感知的学习,这是一套用于仇恨言语检测系统的功能测试。为了解决最初旨在评估的数据的潜在培训的潜在陷阱,我们通过拿出测试用例类别来培训和评估Hatecheck的不同配置的模型,这使我们能够估算潜在被忽略的系统属性的性能。微调程序导致了持有功能和身份组的分类准确性的提高,这表明模型可以潜在地推广到忽略的功能。但是,持有功能类别的性能和I.I.D.仇恨言语检测数据减少了,这表明概括主要发生在同一类的功能之间,并且该过程导致对Hatecheck数据分布的过度拟合。
Behavioural testing -- verifying system capabilities by validating human-designed input-output pairs -- is an alternative evaluation method of natural language processing systems proposed to address the shortcomings of the standard approach: computing metrics on held-out data. While behavioural tests capture human prior knowledge and insights, there has been little exploration on how to leverage them for model training and development. With this in mind, we explore behaviour-aware learning by examining several fine-tuning schemes using HateCheck, a suite of functional tests for hate speech detection systems. To address potential pitfalls of training on data originally intended for evaluation, we train and evaluate models on different configurations of HateCheck by holding out categories of test cases, which enables us to estimate performance on potentially overlooked system properties. The fine-tuning procedure led to improvements in the classification accuracy of held-out functionalities and identity groups, suggesting that models can potentially generalise to overlooked functionalities. However, performance on held-out functionality classes and i.i.d. hate speech detection data decreased, which indicates that generalisation occurs mostly across functionalities from the same class and that the procedure led to overfitting to the HateCheck data distribution.