当前位置: X-MOL 学术Language Testing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Application of an Automated Essay Scoring engine to English writing assessment using Many-Facet Rasch Measurement
Language Testing ( IF 2.400 ) Pub Date : 2022-02-26 , DOI: 10.1177/02655322221076025
Kinnie Kin Yee Chan 1 , Trevor Bond 2 , Zi Yan 3
Affiliation  

We investigated the relationship between the scores assigned by an Automated Essay Scoring (AES) system, the Intelligent Essay Assessor (IEA), and grades allocated by trained, professional human raters to English essay writing by instigating two procedures novel to written-language assessment: the logistic transformation of AES raw scores into hierarchically ordered grades, and the co-calibration of all essay scoring data in a single Rasch measurement framework. A total of 3453 essays were written by 589 US students (in Grades 4, 6, 8, 10, and 12), in response to 18 National Assessment of Educational Progress (NAEP) writing prompts at three grade levels (4, 8, & 12). We randomly assigned one of two versions of the assessment, A or B, to each student. Each version comprised a narrative (N), an informative (I), and a persuasive (P) prompt. Nineteen experienced assessors graded the essays holistically using NAEP scoring guidelines, using a rotating plan in which each essay was rated by four raters. Each essay was additionally scored using the IEA. We estimated the effects of rater, prompt, student, and rubric by using a Many-Facet Rasch Measurement (MFRM) model. Last, within a single Rasch measurement scale, we co-calibrated the students’ grades from human raters and their grades from the IEA to compare them. The AES machine maintained equivalence with human scored ratings and were more consistent than those from human raters.



中文翻译:

自动作文评分引擎在使用多方面 Rasch 测量的英语写作评估中的应用

我们通过采用两种新颖的书面语言评估程序,调查了由自动作文评分 (AES) 系统、智能作文评估员 (IEA) 分配的分数与训练有素的专业人工评估员分配给英语论文写作的分数之间的关系:将 AES 原始分数逻辑转换为按等级排序的等级,以及在单个 Rasch 测量框架中对所有论文评分数据进行联合校准。589 名美国学生(4、6、8、10 和 12 年级)总共写了 3453 篇论文,以响应 18 个国家教育进步评估 (NAEP) 三个年级(4、8 和12)。我们将评估的两个版本(A 或 B)中的一个随机分配给每个学生。每个版本都包含一个叙述 (N)、一个信息性 (I) 和一个说服性 (P) 提示。19 名经验丰富的评估员使用 NAEP 评分指南对文章进行整体评分,使用轮换计划,其中每篇文章由四名评估员评分。每篇论文都使用 IEA 额外评分。我们通过使用多方面 Rasch 测量 (MFRM) 模型估计了评分者、提示、学生和量规的影响。最后,在单个 Rasch 测量量表中,我们共同校准了人类评估者的学生成绩和 IEA 的成绩,以进行比较。AES 机器与人类评分保持等效,并且比人类评分者的评分更一致。并使用多方面 Rasch 测量 (MFRM) 模型的量规。最后,在单个 Rasch 测量量表中,我们共同校准了人类评估者的学生成绩和 IEA 的成绩,以进行比较。AES 机器与人类评分保持等效,并且比人类评分者的评分更一致。并使用多方面 Rasch 测量 (MFRM) 模型的量规。最后,在单个 Rasch 测量量表中,我们共同校准了人类评估者的学生成绩和 IEA 的成绩,以进行比较。AES 机器与人类评分保持等效,并且比人类评分者的评分更一致。

更新日期:2022-02-26
down
wechat
bug