评估对概念抽象基准的理解

论文标题

评估对概念抽象基准的理解

Evaluating Understanding on Conceptual Abstraction Benchmarks

论文作者

Odouard, Victor Vikram, Mitchell, Melanie

论文摘要

AI的长期目标是建立以人类方式理解概念的系统。搁置建立这种系统的困难，即使试图评估一个系统也是一个挑战，这是由于当今AI的相对不透明度及其在寻找快捷方式解决方案的倾向。假设一个可以识别一个概念实例的系统也必须像人类一样理解其他实例，那么人类倾向于拟人化的趋势会加剧这一点。在本文中，我们认为理解一个概念需要在各种环境中使用它的能力。因此，我们通过探测系统在许多不同的实例化中使用给定概念的能力来提出以概念为中心的系统评估。我们介绍了对两个领域的评估案例研究 - 乌鸦（受乌鸦的渐进式矩阵的启发）和抽象和推理语料库（ARC） - 用于在AI系统中开发和评估抽象能力。我们基于概念的评估方法揭示了有关常规测试集将隐藏的有关AI系统的信息。

A long-held objective in AI is to build systems that understand concepts in a humanlike way. Setting aside the difficulty of building such a system, even trying to evaluate one is a challenge, due to present-day AI's relative opacity and its proclivity for finding shortcut solutions. This is exacerbated by humans' tendency to anthropomorphize, assuming that a system that can recognize one instance of a concept must also understand other instances, as a human would. In this paper, we argue that understanding a concept requires the ability to use it in varied contexts. Accordingly, we propose systematic evaluations centered around concepts, by probing a system's ability to use a given concept in many different instantiations. We present case studies of such an evaluations on two domains -- RAVEN (inspired by Raven's Progressive Matrices) and the Abstraction and Reasoning Corpus (ARC) -- that have been used to develop and assess abstraction abilities in AI systems. Our concept-based approach to evaluation reveals information about AI systems that conventional test sets would have left hidden.

下载PDF全文

下载文献需遵守相关版权规定

论文标题