论文标题
智能:通过属性语法解决代数故事问题的情况模型
SMART: A Situation Model for Algebra Story Problems via Attributed Grammar
论文作者
论文摘要
解决代数故事问题仍然是人工智能中的一项具有挑战性的任务,这需要对现实情况和强大的数学推理能力有详细的了解。以前的数学单词问题神经求解器直接将问题文本转化为方程式,缺乏对情况的明确解释,并且通常无法处理更复杂的情况。为了解决神经求解器的这种限制,我们介绍了\ emph {情况模型}的概念,该概念源自心理学研究,以代表人类的精神状态,并提出了\ emph {smart},该{smart}采用了归因于语法作为Algebra故事问题的情况的表示。具体而言,我们首先训练信息提取模块以从问题文本中提取节点,属性和关系,然后根据预定的属性语法生成解析图。还提出了一种迭代学习策略,以进一步提高SMART的性能。为了严格研究此任务,我们仔细策划了一个名为\ emph {asp6.6k}的新数据集。 ASP6.6K的实验结果表明,所提出的模型的表现优于所有以前的神经求解器,同时保持更好的解释性。为了测试这些模型的概括能力,我们还设计了分布(OOD)评估,其中问题比训练集中的问题更为复杂。我们的模型在OOD评估中超过了17 \%的最新模型,证明了其出色的概括能力。
Solving algebra story problems remains a challenging task in artificial intelligence, which requires a detailed understanding of real-world situations and a strong mathematical reasoning capability. Previous neural solvers of math word problems directly translate problem texts into equations, lacking an explicit interpretation of the situations, and often fail to handle more sophisticated situations. To address such limits of neural solvers, we introduce the concept of a \emph{situation model}, which originates from psychology studies to represent the mental states of humans in problem-solving, and propose \emph{SMART}, which adopts attributed grammar as the representation of situation models for algebra story problems. Specifically, we first train an information extraction module to extract nodes, attributes, and relations from problem texts and then generate a parse graph based on a pre-defined attributed grammar. An iterative learning strategy is also proposed to improve the performance of SMART further. To rigorously study this task, we carefully curate a new dataset named \emph{ASP6.6k}. Experimental results on ASP6.6k show that the proposed model outperforms all previous neural solvers by a large margin while preserving much better interpretability. To test these models' generalization capability, we also design an out-of-distribution (OOD) evaluation, in which problems are more complex than those in the training set. Our model exceeds state-of-the-art models by 17\% in the OOD evaluation, demonstrating its superior generalization ability.