论文标题
基于域分类的基于源特定的术语惩罚,用于仇恨语音检测中的域适应
Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection
论文作者
论文摘要
仇恨语音检测的最新方法通常在室外设置中表现出较差的性能。通常,由于分类器过度强调特定于源的信息,对其域的不变性产生了负面影响。先前的工作试图使用功能归因方法从手动策划的列表中惩罚与仇恨语音相关的条款,该方法量化了分类器在做出预测时分配给输入术语的重要性。取而代之的是,我们提出了一种域适应方法,该方法会使用域分类器自动提取并惩罚特定于源的术语,该域分类器学会区分域和仇恨语音类别的特征 - 贡献分数,从而在交叉域评估中始终如一地改进。
State-of-the-art approaches for hate-speech detection usually exhibit poor performance in out-of-domain settings. This occurs, typically, due to classifiers overemphasizing source-specific information that negatively impacts its domain invariance. Prior work has attempted to penalize terms related to hate-speech from manually curated lists using feature attribution methods, which quantify the importance assigned to input terms by the classifier when making a prediction. We, instead, propose a domain adaptation approach that automatically extracts and penalizes source-specific terms using a domain classifier, which learns to differentiate between domains, and feature-attribution scores for hate-speech classes, yielding consistent improvements in cross-domain evaluation.