Journal of Educational Measurement期刊最新论文, 教育, 教育类期刊,

Modeling the Intraindividual Relation of Ability and Speed within a Test

Journal of Educational Measurement (IF 1.188) Pub Date : 2024-04-20
Augustin Mutak, Robert Krause, Esther Ulitzsch, Sören Much, Jochen Ranger, Steffi Pohl

Understanding the intraindividual relation between an individual's speed and ability in testing scenarios is essential to assure a fair assessment. Different approaches exist for estimating this relationship, that either rely on specific study designs or on specific assumptions. This paper aims to add to the toolbox of approaches for estimating this relationship. We propose the intraindividual speed‐ability‐relation

更新日期：2024-04-20

详情收藏

Differential and Functional Response Time Item Analysis: An Application to Understanding Paper versus Digital Reading Processes

Journal of Educational Measurement (IF 1.188) Pub Date : 2024-04-09
Sun‐Joo Cho, Amanda Goodwin, Matthew Naveiras, Jorge Salas

Despite the growing interest in incorporating response time data into item response models, there has been a lack of research investigating how the effect of speed on the probability of a correct response varies across different groups (e.g., experimental conditions) for various items (i.e., differential response time item analysis). Furthermore, previous research has shown a complex relationship between

更新日期：2024-04-09

详情收藏

Modeling Hierarchical Attribute Structures in Diagnostic Classification Models with Multiple Attempts

Journal of Educational Measurement (IF 1.188) Pub Date : 2024-03-30
Tae Yeon Kwon, A. Corinne Huggins-Manley, Jonathan Templin, Mingying Zheng

In classroom assessments, examinees can often answer test items multiple times, resulting in sequential multiple-attempt data. Sequential diagnostic classification models (DCMs) have been developed for such data. As student learning processes may be aligned with a hierarchy of measured traits, this study aimed to develop a sequential hierarchical DCM (sequential HDCM), which combines a sequential DCM

更新日期：2024-04-02

详情收藏

A Bayesian Moderated Nonlinear Factor Analysis Approach for DIF Detection under Violation of the Equal Variance Assumption

Journal of Educational Measurement (IF 1.188) Pub Date : 2024-03-16
Sooyong Lee, Suhwa Han, Seung W. Choi

Research has shown that multiple‐indicator multiple‐cause (MIMIC) models can result in inflated Type I error rates in detecting differential item functioning (DIF) when the assumption of equal latent variance is violated. This study explains how the violation of the equal variance assumption adversely impacts the detection of nonuniform DIF and how it can be addressed through moderated nonlinear factor

更新日期：2024-03-16

详情收藏

Optimal Calibration of Items for Multidimensional Achievement Tests

Journal of Educational Measurement (IF 1.188) Pub Date : 2024-03-14
Mahmood Ul Hassan, Frank Miller

Multidimensional achievement tests are recently gaining more importance in educational and psychological measurements. For example, multidimensional diagnostic tests can help students to determine which particular domain of knowledge they need to improve for better performance. To estimate the characteristics of candidate items (calibration) for future multidimensional achievement tests, we use optimal

更新日期：2024-03-15

详情收藏

Issue Information

Journal of Educational Measurement (IF 1.188) Pub Date : 2024-03-02

Editor CHUN WANG, University of Washington

更新日期：2024-03-03

详情收藏

Argument-Based Approach to Validity: Developing a Living Document and Incorporating Preregistration

Journal of Educational Measurement (IF 1.188) Pub Date : 2024-02-14
Daria Gerasimova

I propose two practical advances to the argument-based approach to validity: developing a living document and incorporating preregistration. First, I present a potential structure for the living document that includes an up-to-date summary of the validity argument. As the validation process may span across multiple studies, the living document allows future users of the instrument to access the entire

更新日期：2024-02-14

详情收藏

DIF Detection for Multiple Groups: Comparing Three-Level GLMMs and Multiple-Group IRT Models

Journal of Educational Measurement (IF 1.188) Pub Date : 2024-02-14
Carmen Köhler, Johannes Hartig, Lale Khorramdel, Artur Pokropek

For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The current study compares two approaches for DIF detection: a

更新日期：2024-02-14

详情收藏

A Dual-Purpose Model for Binary Data: Estimating Ability and Misconceptions

Journal of Educational Measurement (IF 1.188) Pub Date : 2024-01-04
Wenchao Ma, Miguel A. Sorrel, Xiaoming Zhai, Yuan Ge

Most existing diagnostic models are developed to detect whether students have mastered a set of skills of interest, but few have focused on identifying what scientific misconceptions students possess. This article developed a general dual-purpose model for simultaneously estimating students' overall ability and the presence and absence of misconceptions. The expectation-maximization algorithm was developed

更新日期：2024-01-06

详情收藏

Issue Information

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-12-05

Editor CHUN WANG, University of Washington

更新日期：2023-12-06

详情收藏

A Highly Adaptive Testing Design for PISA

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-12-03
Andreas Frey, Christoph König, Aron Fink

The highly adaptive testing (HAT) design is introduced as an alternative test design for the Programme for International Student Assessment (PISA). The principle of HAT is to be as adaptive as possible when selecting items while accounting for PISA's nonstatistical constraints and addressing issues concerning PISA such as item position effects. HAT combines established methods from the field of computerized

更新日期：2023-12-04

详情收藏

Computation and Accuracy Evaluation of Comparable Scores on Culturally Responsive Assessments

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-11-16
Sandip Sinharay, Matthew S. Johnson

Culturally responsive assessments have been proposed as potential tools to ensure equity and fairness for examinees from all backgrounds including those from traditionally underserved or minoritized groups. However, these assessments are relatively new and, with few exceptions, are yet to be implemented in large scale. Consequently, there is a lack of guidance on how one can compute comparable scores

更新日期：2023-11-16

详情收藏

Incorporating Test-Taking Engagement into Multistage Adaptive Testing Design for Large-Scale Assessments

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-11-10
Okan Bulut, Guher Gorgun, Hacer Karamese

The use of multistage adaptive testing (MST) has gradually increased in large-scale testing programs as MST achieves a balanced compromise between linear test design and item-level adaptive testing. MST works on the premise that each examinee gives their best effort when attempting the items, and their responses truly reflect what they know or can do. However, research shows that large-scale assessments

更新日期：2023-11-10

详情收藏

Information Functions of Rank-2PL Models for Forced-Choice Questionnaires

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-10-29
Jianbin Fu, Xuan Tan, Patrick C. Kyllonen

This paper presents the item and test information functions of the Rank two-parameter logistic models (Rank-2PLM) for items with two (pair) and three (triplet) statements in forced-choice questionnaires. The Rank-2PLM model for pairs is the MUPP-2PLM (Multi-Unidimensional Pairwise Preference) and, for triplets, is the Triplet-2PLM. Fisher's information and directional information are described, and

更新日期：2023-10-29

详情收藏

Detecting Multidimensional DIF in Polytomous Items with IRT Methods and Estimation Approaches

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-10-15
Güler Yavuz Temel

The purpose of this study was to investigate multidimensional DIF with a simple and nonsimple structure in the context of multidimensional Graded Response Model (MGRM). This study examined and compared the performance of the IRT-LR and Wald test using MML-EM and MHRM estimation approaches with different test factors and test structures in simulation studies and applying real data sets. When the test

更新日期：2023-10-15

详情收藏

MSAEM Estimation for Confirmatory Multidimensional Four-Parameter Normal Ogive Models

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-10-09
Jia Liu, Xiangbin Meng, Gongjun Xu, Wei Gao, Ningzhong Shi

In this paper, we develop a mixed stochastic approximation expectation-maximization (MSAEM) algorithm coupled with a Gibbs sampler to compute the marginalized maximum a posteriori estimate (MMAPE) of a confirmatory multidimensional four-parameter normal ogive (M4PNO) model. The proposed MSAEM algorithm not only has the computational advantages of the stochastic approximation expectation-maximization

更新日期：2023-10-09

详情收藏

Sociocognitive Processes and Item Response Models: A Didactic Example

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-09-15
Tao Gong, Lan Shuai, Robert J. Mislevy

The usual interpretation of the person and task variables in between-persons measurement models such as item response theory (IRT) is as attributes of persons and tasks, respectively. They can be viewed instead as ensemble descriptors of patterns of interactions among persons and situations that arise from sociocognitive complex adaptive system (CASs). This view offers insights for interpreting and

更新日期：2023-09-15

详情收藏

Issue Information

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-09-04

Editor CHUN WANG, University of Washington

更新日期：2023-09-05

详情收藏

Using Response Time in Multidimensional Computerized Adaptive Testing

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-07-07
Yinhong He, Yuanyuan Qi

In multidimensional computerized adaptive testing (MCAT), item selection strategies are generally constructed based on responses, and they do not consider the response times required by items. This study constructed two new criteria (referred to as DT-inc and DT) for MCAT item selection by utilizing information from response times. The new designs maximize the amount of information per unit time. Furthermore

更新日期：2023-07-07

详情收藏

Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-06-09
Benjamin R. Shear

Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents evidence that among nationally representative samples of

更新日期：2023-06-09

详情收藏

Issue Information

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-06-06

Editor CHUN WANG, University of Washington

更新日期：2023-06-06

详情收藏

Detecting Differential Item Functioning in CAT Using IRT Residual DIF Approach

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-04-28
Hwanggyu Lim, Edison M. Choe

The residual differential item functioning (RDIF) detection framework was developed recently under a linear testing context. To explore the potential application of this framework to computerized adaptive testing (CAT), the present study investigated the utility of the RDIFR statistic both as an index for detecting uniform DIF of pretest items in CAT and as a direct measure of the effect size of uniform

更新日期：2023-04-28

详情收藏

Controlling the Speededness of Assembled Test Forms: A Generalization to the Three-Parameter Lognormal Response Time Model

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-04-27
Benjamin Becker, Sebastian Weirich, Frank Goldhammer, Dries Debeer

When designing or modifying a test, an important challenge is controlling its speededness. To achieve this, van der Linden (2011a, 2011b) proposed using a lognormal response time model, more specifically the two-parameter lognormal model, and automated test assembly (ATA) via mixed integer linear programming. However, this approach has a severe limitation, in that the two-parameter lognormal model

更新日期：2023-04-27

详情收藏

A Note on Latent Traits Estimates under IRT Models with Missingness

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-04-26
Jinxin Guo, Xin Xu, Tao Xin

Missingness due to not-reached items and omitted items has received much attention in the recent psychometric literature. Such missingness, if not handled properly, would lead to biased parameter estimation, as well as inaccurate inference of examinees, and further erode the validity of the test. This paper reviews some commonly used IRT based models allowing missingness, followed by three popular

更新日期：2023-04-26

详情收藏

Online Monitoring of Test-Taking Behavior Based on Item Responses and Response Times

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-04-17
Suhwa Han, Hyeon-Ah Kang

The study presents multivariate sequential monitoring procedures for examining test-taking behaviors online. The procedures monitor examinee's responses and response times and signal aberrancy as soon as significant change is identifieddetected in the test-taking behavior. The study in particular proposes three schemes to track different indicators of a test-taking mode—the observable manifest variables

更新日期：2023-04-17

详情收藏

Issue Information

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-03-16

Editor CHUN WANG, University of Washington

更新日期：2023-03-16

详情收藏

Pretest Item Calibration in Computerized Multistage Adaptive Testing

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-03-10
Rabia Karatoprak Ersen, Won-Chan Lee

The purpose of this study was to compare calibration and linking methods for placing pretest item parameter estimates on the item pool scale in a 1-3 computerized multistage adaptive testing design in terms of item parameter recovery. Two models were used: embedded-section, in which pretest items were administered within a separate module, and embedded-items, in which pretest items were distributed

更新日期：2023-03-10

详情收藏

Classical Item Analysis from a Signal Detection Perspective

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-02-27
Lawrence T. DeCarlo

A conceptualization of multiple-choice exams in terms of signal detection theory (SDT) leads to simple measures of item difficulty and item discrimination that are closely related to, but also distinct from, those used in classical item analysis (CIA). The theory defines a “true split,” depending on whether or not examinees know an item, and so it provides a basis for using total scores to split item

更新日期：2023-02-27

详情收藏

Corrigendum: A Residual-Based Differential Item Functioning Detection Framework in Item Response Theory

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-02-26
Hwanggyu Lim, Edison M. Choe, Kyung T. Han

In the original article, it was written that “Then the MLE scoring and DIF analysis with RDIF statistics were performed using the est_score and rdif functions, respectively, in the R (R Core Team, 2019) package irtplay (p.90).” However, the irtplay package has been removed from the CRAN repository due to intellectual property (IP) violation issues. Instead, a new R package called irtQ (Lim & Wells

更新日期：2023-02-26

详情收藏

Using Simulated Retests to Estimate the Reliability of Diagnostic Assessment Systems

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-02-19
W. Jake Thompson, Brooke Nash, Amy K. Clark, Jeffrey C. Hoover

As diagnostic classification models become more widely used in large-scale operational assessments, we must give consideration to the methods for estimating and reporting reliability. Researchers must explore alternatives to traditional reliability methods that are consistent with the design, scoring, and reporting levels of diagnostic assessment systems. In this article, we describe and evaluate a

更新日期：2023-02-19

详情收藏

Using Linkage Sets to Improve Connectedness in Rater Response Model Estimation

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-02-19
Jodi M. Casabianca, John R. Donoghue, Hyo Jeong Shin, Szu-Fu Chao, Ikkyu Choi

Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios, there tends to be a large proportion of missing data

更新日期：2023-02-19

详情收藏

An Exploration of an Improved Aggregate Student Growth Measure Using Data from Two States

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-01-31
Katherine E. Castellano, Daniel F. McCaffrey, J. R. Lockwood

The simple average of student growth scores is often used in accountability systems, but it can be problematic for decision making. When computed using a small/moderate number of students, it can be sensitive to the sample, resulting in inaccurate representations of growth of the students, low year-to-year stability, and inequities for low-incidence groups. An alternative designed to address these

更新日期：2023-01-31

详情收藏

Classification Accuracy and Consistency of Compensatory Composite Test Scores

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-01-28
J. Carl Setzer, Ying Cheng, Cheng Liu

Test scores are often used to make decisions about examinees, such as in licensure and certification testing, as well as in many educational contexts. In some cases, these decisions are based upon compensatory scores, such as those from multiple sections or components of an exam. Classification accuracy and classification consistency are two psychometric characteristics of test scores that are often

更新日期：2023-01-28

详情收藏

Issue Information

Journal of Educational Measurement (IF 1.188) Pub Date : 2023-01-06

Editor SANDIP SINHARAY, Educational Testing Service

更新日期：2023-01-09

详情收藏

Specifying the Three Ws in Educational Measurement: Who Uses Which Scores for What Purpose?

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-12-25
Andrew Ho

I argue that understanding and improving educational measurement requires specificity about actors, scores, and purpose: Who uses which scores for what purpose? I show how this specificity complements Briggs’ frameworks for educational measurement that he presented in his 2022 address as president of the National Council on Measurement in Education.

更新日期：2022-12-25

详情收藏

Online Calibration in Multidimensional Computerized Adaptive Testing with Polytomously Scored Items

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-12-15
Lu Yuan, Yingshi Huang, Shuhang Li, Ping Chen

Online calibration is a key technology for item calibration in computerized adaptive testing (CAT) and has been widely used in various forms of CAT, including unidimensional CAT, multidimensional CAT (MCAT), CAT with polytomously scored items, and cognitive diagnostic CAT. However, as multidimensional and polytomous assessment data become more common, only a few published reports focus on online calibration

更新日期：2022-12-15

详情收藏

Measuring the Uncertainty of Imputed Scores

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-12-14
Sandip Sinharay

Technical difficulties and other unforeseen events occasionally lead to incomplete data on educational tests, which necessitates the reporting of imputed scores to some examinees. While there exist several approaches for reporting imputed scores, there is a lack of any guidance on the reporting of the uncertainty of imputed scores. In this paper, several approaches are suggested for quantifying the

更新日期：2022-12-14

详情收藏

An Exponentially Weighted Moving Average Procedure for Detecting Back Random Responding Behavior

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-12-09
Yinhong He

Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the exponentially weighted moving average (EWMA) obtains more detailed

更新日期：2022-12-09

详情收藏

Multiple-Group Joint Modeling of Item Responses, Response Times, and Action Counts with the Conway-Maxwell-Poisson Distribution

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-12-07
Xin Qiao, Hong Jiao, Qiwei He

Multiple group modeling is one of the methods to address the measurement noninvariance issue. Traditional studies on multiple group modeling have mainly focused on item responses. In computer-based assessments, joint modeling of response times and action counts with item responses helps estimate the latent speed and action levels in addition to latent ability. These two new data sources can also be

更新日期：2022-12-07

详情收藏

A Unified Comparison of IRT-Based Effect Sizes for DIF Investigations

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-11-07
R. Philip Chalmers

Several marginal effect size (ES) statistics suitable for quantifying the magnitude of differential item functioning (DIF) have been proposed in the area of item response theory; for instance, the Differential Functioning of Items and Tests (DFIT) statistics, signed and unsigned item difference in the sample statistics (SIDS, UIDS, NSIDS, and NUIDS), the standardized indices of impact, and the differential

更新日期：2022-11-07

详情收藏

A Statistical Test for the Detection of Item Compromise Combining Responses and Response Times

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-10-28
Wim J. van der Linden, Dmitry I. Belov

A test of item compromise is presented which combines the test takers' responses and response times (RTs) into a statistic defined as the number of correct responses on the item for test takers with RTs flagged as suspicious. The test has null and alternative distributions belonging to the well-known family of compound binomial distributions, is simple to calculate, and has results that are easy to

更新日期：2022-10-28

详情收藏

Fully Gibbs Sampling Algorithms for Bayesian Variable Selection in Latent Regression Models

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-10-25
Kazuhiro Yamaguchi, Jihong Zhang

This study proposed Gibbs sampling algorithms for variable selection in a latent regression model under a unidimensional two-parameter logistic item response theory model. Three types of shrinkage priors were employed to obtain shrinkage estimates: double-exponential (i.e., Laplace), horseshoe, and horseshoe+ priors. These shrinkage priors were compared to a uniform prior case in both simulation and

更新日期：2022-10-25

详情收藏

A Factor Mixture Model for Item Responses and Certainty of Response Indices to Identify Student Knowledge Profiles

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-10-10
Chia-Wen Chen, Björn Andersson, Jinxin Zhu

The certainty of response index (CRI) measures respondents' confidence level when answering an item. In conjunction with the answers to the items, previous studies have used descriptive statistics and arbitrary thresholds to identify student knowledge profiles with the CRIs. Whereas this approach overlooked the measurement error of the observed item responses and indices, we address this by proposing

更新日期：2022-10-10

详情收藏

Issue Information

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-10-03

Editor SANDIP SINHARAY, Educational Testing Service

更新日期：2022-10-04

详情收藏

Using Item Scores and Distractors in Person-Fit Assessment

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-09-16
Kylie Gorney, James A. Wollack

In order to detect a wide range of aberrant behaviors, it can be useful to incorporate information beyond the dichotomous item scores. In this paper, we extend the l z $l_z$ and l z ∗ $l_z^*$ person-fit statistics so that unusual behavior in item scores and unusual behavior in item distractors can be used as indicators of aberrance. Through detailed simulations, we show that the new statistics are

更新日期：2022-09-16

详情收藏

A New Bayesian Person-Fit Analysis Method Using Pivotal Discrepancy Measures

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-09-02
Adam Combs

A common method of checking person-fit in Bayesian item response theory (IRT) is the posterior-predictive (PP) method. In recent years, more powerful approaches have been proposed that are based on resampling methods using the popular Lz∗$L_{z}^{*}$ statistic. There has also been proposed a new Bayesian model checking method based on pivotal discrepancy measures (PDMs). A PDM T is a discrepancy measure

更新日期：2022-09-02

详情收藏

Several Variations of Simple-Structure MIRT Equating

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-07-28
Stella Y. Kim, Won-Chan Lee

The current study proposed several variants of simple-structure multidimensional item response theory equating procedures. Four distinct sets of data were used to demonstrate feasibility of proposed equating methods for two different equating designs: a random groups design and a common-item nonequivalent groups design. Findings indicated some notable differences between the multidimensional and unidimensional

更新日期：2022-07-28

详情收藏

Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-07-08
David W. Dorsey, Hillary R. Michaels

We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement—one that has captured our collective interest and imagination. Scientists and practitioners within the domains of organizational and workforce assessment have increasingly

更新日期：2022-07-08

详情收藏

A Deterministic Gated Lognormal Response Time Model to Identify Examinees with Item Preknowledge

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-07-07
Murat Kasli, Cengiz Zopluoglu, Sarah L. Toton

Response times (RTs) have recently attracted a significant amount of attention in the literature as they may provide meaningful information about item preknowledge. In this study, a new model, the Deterministic Gated Lognormal Response Time (DG-LNRT) model, is proposed to identify examinees with item preknowledge using RTs. The proposed model was applied to two different data sets and performance was

更新日期：2022-07-07

详情收藏

Cognitive Diagnostic Multistage Testing by Partitioning Hierarchically Structured Attributes

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-07-05
Rae Yeong Kim, Yun Joo Yoo

In cognitive diagnostic models (CDMs), a set of fine-grained attributes is required to characterize complex problem solving and provide detailed diagnostic information about an examinee. However, it is challenging to ensure reliable estimation and control computational complexity when The test aims to identify the examinee's attribute profile in a large-scale map of attributes. To address this problem

更新日期：2022-07-05

详情收藏

Issue Information

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-06-22

Editor SANDIP SINHARAY, Educational Testing Service

更新日期：2022-06-23

详情收藏

Estimating Classification Accuracy and Consistency Indices for Multiple Measures with the Simple Structure MIRT Model

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-06-20
Seohee Park, Kyung Yong Kim, Won-Chan Lee

Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an approach to estimate classification consistency and

更新日期：2022-06-20

详情收藏

Optimizing Implementation of Artificial-Intelligence-Based Automated Scoring: An Evidence Centered Design Approach for Designing Assessments for AI-based Scoring

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-06-12
Kadriye Ercikan, Daniel F. McCaffrey

Artificial-intelligence-based automated scoring is often an afterthought and is considered after assessments have been developed, resulting in nonoptimal possibility of implementing automated scoring solutions. In this article, we provide a review of Artificial intelligence (AI)-based methodologies for scoring in educational assessments. We then propose an evidence-centered design framework for developing

更新日期：2022-06-12

详情收藏

Latent Space Model for Process Data

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-06-12
Yi Chen, Jingru Zhang, Yi Yang, Young-Sun Lee

The development of human-computer interactive items in educational assessments provides opportunities to extract useful process information for problem-solving. However, the complex, intensive, and noisy nature of process data makes it challenging to model with the traditional psychometric methods. Social network methods have been applied to visualize and analyze process data. Nonetheless, research

更新日期：2022-06-12

详情收藏

Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment: A Discussion and Look Forward

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-06-09
David W. Dorsey, Hillary R. Michaels

In this concluding article of the special issue, we provide an overall discussion and point to future emerging trends in AI that might shape our approach to validity and building validity arguments.

更新日期：2022-06-09

详情收藏

Validity Arguments for AI-Based Automated Scores: Essay Scoring as an Illustration

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-06-08
Steve Ferrara, Saed Qunbar

In this article, we argue that automated scoring engines should be transparent and construct relevant—that is, as much as is currently feasible. Many current automated scoring engines cannot achieve high degrees of scoring accuracy without allowing in some features that may not be easily explained and understood and may not be obviously and directly relevant to the target assessment construct. We address

更新日期：2022-06-08

详情收藏

Toward Argument-Based Fairness with an Application to AI-Enhanced Educational Assessments

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-06-01
A. Corinne Huggins-Manley, Brandon M. Booth, Sidney K. D'Mello

The field of educational measurement places validity and fairness as central concepts of assessment quality. Prior research has proposed embedding fairness arguments within argument-based validity processes, particularly when fairness is conceived as comparability in assessment properties across groups. However, we argue that a more flexible approach to fairness arguments that occurs outside of and

更新日期：2022-06-01

详情收藏

Psychometric Methods to Evaluate Measurement and Algorithmic Bias in Automated Scoring

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-06-01
Matthew S. Johnson, Xiang Liu, Daniel F. McCaffrey

With the increasing use of automated scores in operational testing settings comes the need to understand the ways in which they can yield biased and unfair results. In this paper, we provide a brief survey of some of the ways in which the predictive methods used in automated scoring can lead to biased, and thus unfair automated scores. After providing definitions of fairness from machine learning and

更新日期：2022-06-01

详情收藏

Linking and Comparability across Conditions of Measurement: Established Frameworks and Proposed Updates

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-05-30
Tim Moses

One result of recent changes in testing is that previously established linking frameworks may not adequately address challenges in current linking situations. Test linking through equating, concordance, vertical scaling or battery scaling may not represent linkings for the scores of tests developed to measure constructs differently for different examinees, or tests that are administered in different

更新日期：2022-05-30

详情收藏

Anchoring Validity Evidence for Automated Essay Scoring

Journal of Educational Measurement (IF 1.188) Pub Date : 2022-05-15
Mark D. Shermis

One of the challenges of discussing validity arguments for machine scoring of essays centers on the absence of a commonly held definition and theory of good writing. At best, the algorithms attempt to measure select attributes of writing and calibrate them against human ratings with the goal of accurate prediction of scores for new essays. Sometimes these attributes are based on the fundamentals of

更新日期：2022-05-15

详情收藏