论文标题

尼泊尔语中语言的禁忌和委婉语

Linguistic Taboos and Euphemisms in Nepali

论文作者

Niraula, Nobal B., Dulal, Saurab, Koirala, Diwa

论文摘要

世界各地的语言都有言语,短语和行为 - 禁忌 - 在公共交流中避免了公众的沟通,认为它们是对社会的社会,宗教和道德价值观的淫秽或令人不安的。但是,人们故意使用这些语言禁忌和其他语言构造来发挥伤害,贬义和淫秽的评论。几乎不可能构建一套通用的进攻或禁忌术语,因为攻击性完全取决于不同因素,例如社会物理环境,说话者的陪伴关系和单词选择。在本文中,我们提出了一项基于语料库的尼泊尔进攻语言的研究。我们确定并描述了18种不同类别的语言犯罪,包括政治,宗教,种族和性别。我们讨论了12种常见的委婉语,例如同义词,隐喻和偏爱。此外,我们介绍了一个手动构建的数据集,其中包括1000多种令人反感和禁忌术语在当代演讲者中流行。这项对进攻性语言和资源的深入研究将为多个下游任务(例如进攻性语言检测和语言学习)奠定基础。

Languages across the world have words, phrases, and behaviors -- the taboos -- that are avoided in public communication considering them as obscene or disturbing to the social, religious, and ethical values of society. However, people deliberately use these linguistic taboos and other language constructs to make hurtful, derogatory, and obscene comments. It is nearly impossible to construct a universal set of offensive or taboo terms because offensiveness is determined entirely by different factors such as socio-physical setting, speaker-listener relationship, and word choices. In this paper, we present a detailed corpus-based study of offensive language in Nepali. We identify and describe more than 18 different categories of linguistic offenses including politics, religion, race, and sex. We discuss 12 common euphemisms such as synonym, metaphor and circumlocution. In addition, we introduce a manually constructed data set of over 1000 offensive and taboo terms popular among contemporary speakers. This in-depth study of offensive language and resource will provide a foundation for several downstream tasks such as offensive language detection and language learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源