Psychometric approaches to analyzing C-tests,Language Testing

当前位置： X-MOL 学术 › Language Testing › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Psychometric approaches to analyzing C-tests
Language Testing ( IF 2.400 ) Pub Date : 2022-02-28 , DOI: 10.1177/02655322211062138
David Alpizar ₁ , Tongyun Li ₂ , John M. Norris ₂ , Lixiong Gu ₂

Affiliation

The C-test is a type of gap-filling test designed to efficiently measure second language proficiency. The typical C-test consists of several short paragraphs with the second half of every second word deleted. The words with deleted parts are considered as items nested within the corresponding paragraph. Given this testlet structure, it is commonly taken for granted that the C-test design may violate the local independence assumption. However, this assumption has not been fully investigated in the C-test research to date, including the evaluation of alternative psychometric models (i.e., unidimensional and multidimensional) to calibrate and score the C-test. This study addressed each of these issues using a large data set of responses to an English-language C-test. First, we examined the local item independence assumption via multidimensional item response theory (IRT) models, Yen’s Q3, and Jackknife Slope Index. Second, we evaluated several IRT models to determine optimal approaches to scoring the C-test. The results support an interpretation of unidimensionality for the C-test items within a paragraph, with only minor evidence of local item dependence. Furthermore, the two-parameter logistic (2PL) IRT model was found to be the most appropriate model for calibrating and scoring the C-test. Implications for designing, scoring, and analyzing C-tests are discussed.

中文翻译：

分析 C 测试的心理测量方法

C 测试是一种填补空白的测试，旨在有效地衡量第二语言的熟练程度。典型的 C 测试由几个短段落组成，每两个单词的后半部分被删除。删除部分的单词被视为嵌套在相应段落中的项目。鉴于这种 testlet 结构，通常认为 C 测试设计可能违反局部独立性假设是理所当然的。然而，迄今为止，这一假设尚未在 C 测试研究中得到充分研究，包括评估替代心理测量模型（即一维和多维）以校准和评分 C 测试。本研究使用对英语 C 测试的响应的大型数据集解决了这些问题中的每一个。第一的，Q3和折刀坡度指数。其次，我们评估了几个 IRT 模型以确定对 C 测试进行评分的最佳方法。结果支持对段落中 C 测试项目的一维性的解释，只有少量证据表明本地项目依赖性。此外，发现双参数逻辑 (2PL) IRT 模型是校准和评分 C 检验的最合适模型。讨论了设计、评分和分析 C 测试的含义。

更新日期：2022-02-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>