论文标题
Prob2VEC:自适应辅导中问题检索的数学语义嵌入
Prob2Vec: Mathematical Semantic Embedding for Problem Retrieval in Adaptive Tutoring
论文作者
论文摘要
我们建议在自适应辅导中提出嵌入技术的新应用。目的是检索与数学概念相似的问题。有两个挑战:首先,例如句子,对辅导有用的问题在基本概念方面永远不会完全相同。取而代之的是,良好的问题以创新的方式将概念混合在一起,同时仍在其关系中显示连续性。其次,人类很难确定在足够大的训练集中保持一致的相似性分数。我们提出了一个嵌入算法的分层问题,称为prob2Vec,该算法包括抽象和嵌入步骤。 Prob2Vec在问题相似性测试中实现了96.88 \%的精度,而直接应用最新的句子嵌入方法则是75 \%。有趣的是,Prob2Vec能够区分问题之间非常细粒度的差异,人类需要时间和精力来获取。此外,使用不平衡培训数据集的概念标签的子问题本身就很有趣。这是一个具有多维性爆炸的多标签问题,我们提出了改善方法。我们提出了新型的负训练算法,该算法使用不平衡的训练数据集大幅度降低了分类的假阴性和正值比率。
We propose a new application of embedding techniques for problem retrieval in adaptive tutoring. The objective is to retrieve problems whose mathematical concepts are similar. There are two challenges: First, like sentences, problems helpful to tutoring are never exactly the same in terms of the underlying concepts. Instead, good problems mix concepts in innovative ways, while still displaying continuity in their relationships. Second, it is difficult for humans to determine a similarity score that is consistent across a large enough training set. We propose a hierarchical problem embedding algorithm, called Prob2Vec, that consists of abstraction and embedding steps. Prob2Vec achieves 96.88\% accuracy on a problem similarity test, in contrast to 75\% from directly applying state-of-the-art sentence embedding methods. It is interesting that Prob2Vec is able to distinguish very fine-grained differences among problems, an ability humans need time and effort to acquire. In addition, the sub-problem of concept labeling with imbalanced training data set is interesting in its own right. It is a multi-label problem suffering from dimensionality explosion, which we propose ways to ameliorate. We propose the novel negative pre-training algorithm that dramatically reduces false negative and positive ratios for classification, using an imbalanced training data set.