A sequential approach to detecting differential rater functioning in sparse rater-mediated assessment networks,Language Testing

当前位置： X-MOL 学术 › Language Testing › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A sequential approach to detecting differential rater functioning in sparse rater-mediated assessment networks
Language Testing ( IF 2.400 ) Pub Date : 2022-05-12 , DOI: 10.1177/02655322221092388
Stefanie A. Wind ₁

Affiliation

Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting DRF may be limited in sparse rating designs, where it is not possible for every rater to score every student. In these designs, there is limited information with which to detect DRF. Sparse designs can also exacerbate the impact of artificial DRF, which occurs when raters are inaccurately flagged for DRF due to statistical artifacts. In this study, a sequential method is adapted from previous research on differential item functioning (DIF) that allows researchers to detect DRF more accurately and distinguish between true and artificial DRF. Analyses of data from a rater-mediated writing assessment and a simulation study demonstrate that the sequential approach results in different conclusions about which raters exhibit DRF. Moreover, the simulation study results suggest that the sequential procedure results in improved accuracy in DRF detection across a variety of rating design conditions. Practical implications for language testing research are discussed.

中文翻译：

在稀疏评估者介导的评估网络中检测差异评估者功能的顺序方法

研究人员经常在绩效评估中评估评估者的判断，以寻找差异评估者功能 (DRF) 的证据，在控制学生成绩水平后，当评估者的严重程度与结构无关的学生特征系统地相关时，就会发生这种情况。然而，研究人员观察到，检测 DRF 的方法可能在稀疏评分设计中受到限制，在这种设计中，不可能每个评分者都给每个学生打分。在这些设计中，用于检测 DRF 的信息有限。稀疏设计还会加剧人工 DRF 的影响，当评估者由于统计伪影而被错误地标记为 DRF 时，就会发生这种情况。在这项研究中，一种顺序方法改编自先前对差异项目功能 (DIF) 的研究，使研究人员能够更准确地检测 DRF 并区分真实和人工 DRF。对评分者介导的写作评估和模拟研究的数据分析表明，顺序方法会得出关于哪些评分者表现出 DRF 的不同结论。此外，模拟研究结果表明，在各种评级设计条件下，顺序程序提高了 DRF 检测的准确性。讨论了语言测试研究的实际意义。模拟研究结果表明，在各种评级设计条件下，顺序程序提高了 DRF 检测的准确性。讨论了语言测试研究的实际意义。模拟研究结果表明，在各种评级设计条件下，顺序程序提高了 DRF 检测的准确性。讨论了语言测试研究的实际意义。

更新日期：2022-05-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>