论文标题
多项选择问题生成:迈向自动化评估框架
Multiple-Choice Question Generation: Towards an Automated Assessment Framework
论文作者
论文摘要
自动化问题的生成是使英语理解评估个性化的重要方法。最近,基于变压器的验证语言模型已经证明了从上下文段落中提出适当问题的能力。通常,使用基于N-Gram的指标或手动定性评估对手动生成的问题的参考组进行评估。在这里,我们专注于完全自动化的多项选择问题生成(MCQG)系统,在该系统中必须从上下文段落中生成问题和可能的答案。应用基于N-Gram的方法对于这种形式的系统来说是具有挑战性的,因为参考集不太可能捕获所有可能的问题和答案选项。相反,手动评估的尺度较差,对于MCQG系统开发而言是昂贵的。在这项工作中,我们提出了一套绩效标准,该标准评估了产生的多项选择问题的不同方面。这些素质包括:语法正确性,答复性,多样性和复杂性。描述了这些指标中每个指标的初始系统,并对标准的多项选择阅读理解科目进行了单独评估。
Automated question generation is an important approach to enable personalisation of English comprehension assessment. Recently, transformer-based pretrained language models have demonstrated the ability to produce appropriate questions from a context paragraph. Typically, these systems are evaluated against a reference set of manually generated questions using n-gram based metrics, or manual qualitative assessment. Here, we focus on a fully automated multiple-choice question generation (MCQG) system where both the question and possible answers must be generated from the context paragraph. Applying n-gram based approaches is challenging for this form of system as the reference set is unlikely to capture the full range of possible questions and answer options. Conversely manual assessment scales poorly and is expensive for MCQG system development. In this work, we propose a set of performance criteria that assess different aspects of the generated multiple-choice questions of interest. These qualities include: grammatical correctness, answerability, diversity and complexity. Initial systems for each of these metrics are described, and individually evaluated on standard multiple-choice reading comprehension corpora.