当前位置: X-MOL 学术Journal of Educational Measurement › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests
Journal of Educational Measurement ( IF 1.188 ) Pub Date : 2023-06-09 , DOI: 10.1111/jedm.12372
Benjamin R. Shear 1
Affiliation  

Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents evidence that among nationally representative samples of 15-year-olds in the United States participating in the 2009, 2012, and 2015 PISA math and reading tests, there are consistent item format by gender differences. On average, male students answer multiple-choice items correctly relatively more often and female students answer constructed-response items correctly relatively more often. These patterns were consistent across 34 additional participating PISA jurisdictions, although the size of the format differences varied and were larger on average in reading than math. The average magnitude of the format differences is not large enough to be flagged in routine differential item functioning analyses intended to detect test bias but is large enough to raise questions about the validity of inferences based on comparisons of scores across gender groups. Researchers and other test users should account for test item format, particularly when comparing scores across gender groups.

中文翻译:

测试项目格式中的性别偏见:来自 PISA 2009、2012 和 2015 年数学和阅读测试的证据

定期使用大规模标准化测试来衡量学生的整体成绩和学生小组的成绩。这些用途假设测试提供了不同学生亚组结果的可比较衡量标准,但先前的研究表明,不同性别群体之间的分数比较可能会因所使用的测试项目类型而变得复杂。本文提出的证据表明,在参加 2009 年、2012 年和 2015 年 PISA 数学和阅读测试的美国 15 岁青少年的全国代表性样本中,性别差异存在一致的项目格式。平均而言,男生正确回答多项选择题的频率相对较高,而女生正确回答构答题的频率相对较高。这些模式在另外 34 个参与 PISA 的管辖区中是一致的,尽管格式差异的大小各不相同,并且阅读的平均差异大于数学。格式差异的平均幅度不足以在旨在检测测试偏差的常规差异项目功能分析中进行标记,但大到足以对基于不同性别组分数比较的推论的有效性提出质疑。研究人员和其他测试用户应考虑测试项目的格式,特别是在比较不同性别组的分数时。
更新日期:2023-06-09
down
wechat
bug