论文标题
教育测试中的项目质量控制:变更点模型,复合风险和顺序检测
Item Quality Control in Educational Testing: Change Point Model, Compound Risk, and Sequential Detection
论文作者
论文摘要
在标准化的教育测试中,在多个测试管理中重复使用测试项目。为了确保测试分数的有效性,随着时间的推移,项目的心理测量特性应保持不变。在本文中,我们考虑了对测试项目的顺序监测,特别是检测到其心理测量特性的突然更改,在这种情况下,可以通过例如泄漏的项目或相应课程的更改引起更改。我们提出了一个统计框架,用于检测单个项目的突然变化。该框架由(1)多流贝叶斯变更点模型组成,描述了项目的顺序变化,(2)一个复合风险函数量化顺序决策中的风险,以及(3)控制复合风险的顺序决策规则。在整个顺序决策过程中,拟议的决策规则平衡了两个错误来源之间的权衡,错误检测前变更项目和不转换后项目的未检测。基于项目响应理论模型提出了特定于项目的监视统计量,该模型消除了随着时间的流逝而变化的考生人群的混杂。顺序决策规则及其理论属性是在两个设置下开发的:贝叶斯变更点模型是完全已知的甲骨文设置,而模型的某些参数未知的更现实的设置。模拟研究是在模仿实际操作测试的设置下进行的。
In standardized educational testing, test items are reused in multiple test administrations. To ensure the validity of test scores, the psychometric properties of items should remain unchanged over time. In this paper, we consider the sequential monitoring of test items, in particular, the detection of abrupt changes to their psychometric properties, where a change can be caused by, for example, leakage of the item or change of the corresponding curriculum. We propose a statistical framework for the detection of abrupt changes in individual items. This framework consists of (1) a multi-stream Bayesian change point model describing sequential changes in items, (2) a compound risk function quantifying the risk in sequential decisions, and (3) sequential decision rules that control the compound risk. Throughout the sequential decision process, the proposed decision rule balances the trade-off between two sources of errors, the false detection of pre-change items and the non-detection of post-change items. An item-specific monitoring statistic is proposed based on an item response theory model that eliminates the confounding from the examinee population which changes over time. Sequential decision rules and their theoretical properties are developed under two settings: the oracle setting where the Bayesian change point model is completely known and a more realistic setting where some parameters of the model are unknown. Simulation studies are conducted under settings that mimic real operational tests.