论文标题
对人类和语言模型中务实语言理解的细粒度比较
A fine-grained comparison of pragmatic language understanding in humans and language models
论文作者
论文摘要
语用学和非文字语言理解对于人类交流至关重要,并对人工语言模型提出了长期的挑战。我们使用零拍,对一组专家策划的英语材料进行零拍,对语言模型和人类进行细粒度的比较。我们询问模型(1)是否选择说话者话语的务实解释,(2)制作与人类类似的错误模式,以及(3)使用与人类类似的语言提示来解决任务。我们发现,最大的模型具有高精度并匹配人体错误模式:在不正确的响应中,模型偏向于基于启发式的干扰因素的字面解释。我们还发现初步证据表明模型和人类对类似的语言提示敏感。我们的结果表明,实用行为可以在模型中出现,而无需明确构造精神状态的表示。但是,模型倾向于在依赖社会期望违规的现象上挣扎。
Pragmatics and non-literal language understanding are essential to human communication, and present a long-standing challenge for artificial language models. We perform a fine-grained comparison of language models and humans on seven pragmatic phenomena, using zero-shot prompting on an expert-curated set of English materials. We ask whether models (1) select pragmatic interpretations of speaker utterances, (2) make similar error patterns as humans, and (3) use similar linguistic cues as humans to solve the tasks. We find that the largest models achieve high accuracy and match human error patterns: within incorrect responses, models favor literal interpretations over heuristic-based distractors. We also find preliminary evidence that models and humans are sensitive to similar linguistic cues. Our results suggest that pragmatic behaviors can emerge in models without explicitly constructed representations of mental states. However, models tend to struggle with phenomena relying on social expectation violations.