多组DIF方法在评估国际大规模评估的越野分数可比性方面的性能

论文标题

多组DIF方法在评估国际大规模评估的越野分数可比性方面的性能

Performance of Multi-group DIF Methods in Assessing Cross-Country Score Comparability of International Large-Scale Assessments

论文作者

Chen, Dandan

论文摘要

标准化的大规模测试可能是一个有争议的主题，在该主题中，测试公平是其核心。这项研究发现，最近五分之五的多组DIF检测方法能够捕获影响测试公平性的均匀和非均匀DIF。尽管如此，先前的研究还没有证明这两种方法相互比较时的相对性能。这两种方法是改进的WALD测试和广义逻辑回归程序。这项研究评估了这两种方法和最新的TIMSS数学分数数据的两组经验结果之间的共同点和差异。主要的结论是，与多组DIF分析的广义逻辑回归程序相比，改进的WALD检验相对确定。这项研究的经验结果可能会为ILSA评分分析中多组DIF方法的选择提供信息。

Standardized large-scale testing can be a debatable topic, in which test fairness sits at its very core. This study found that two out of five recent multi-group DIF detection methods are capable of capturing both the uniform and nonuniform DIF that affects test fairness. Still, no prior research has demonstrated the relative performance of these two methods when they are compared with each other. These two methods are the improved Wald test and the generalized logistic regression procedure. This study assessed the commonalities and differences between two sets of empirical results from these two methods with the latest TIMSS math score data. The primary conclusion was that the improved Wald test is relatively more established than the generalized logistic regression procedure for multi-group DIF analysis. Empirical results from this study may inform the selection of a multi-group DIF method in the ILSA score analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题