-
Computer-Based Listening Test with Full Video, Visual-Limited Video, and Audio: A Comparative Analysis Based on Difficulty, Discrimination Power, and Response Time Applied Measurement in Education (IF 1.528) Pub Date : 2024-02-14 Takahiro Terao
This study aimed to compare item characteristics and response time between stimulus conditions in computer-delivered listening tests. Listening materials had three variants: regular videos, frame-b...
-
Traditional vs Intersectional DIF Analysis: Considerations and a Comparison Using State Testing Data Applied Measurement in Education (IF 1.528) Pub Date : 2024-02-09 Tony Albano, Brian F. French, Thao Thu Vo
Recent research has demonstrated an intersectional approach to the study of differential item functioning (DIF). This approach expands DIF to account for the interactions between what have traditio...
-
Modeling Dimensions Converging at the Upper Anchor in Learning Progressions: An Example of Micro-Evolution Applied Measurement in Education (IF 1.528) Pub Date : 2024-02-07 Mingfeng Xue, Mark Wilson
Multidimensionality is common in psychological and educational measurements. This study focuses on dimensions that converge at the upper anchor (i.e. the highest acquisition status defined in a lea...
-
Are Online and Paper Tests Comparable? Evidence from Statewide K-12 Tests Applied Measurement in Education (IF 1.528) Pub Date : 2024-02-05 Ben Backes, James Cowan
We investigate two research questions using a recent statewide transition from paper to computer-based testing: first, the extent to which test mode effects found in prior studies can be eliminated...
-
Comparing Examinee-Based and Response-Based Motivation Filtering Methods in Remote Low-Stakes Testing Applied Measurement in Education (IF 1.528) Pub Date : 2024-02-02 Sarah Alahmadi, Christine E. DeMars
Large-scale educational assessments are sometimes considered low-stakes, increasing the possibility of confounding true performance level with low motivation. These concerns are amplified in remote...
-
Don’t Test After Lunch: The Relationship Between Disengagement and the Time of Day That Low-Stakes Testing Occurs Applied Measurement in Education (IF 1.528) Pub Date : 2024-02-02 Steven L. Wise, Megan R. Kuhfeld, Marlit Annalena Lindner
When student achievement is assessed, we seek to elicit a student’s maximum performance – a goal requiring the assumption that the student is fully engaged. Otherwise, to the extent that disengagem...
-
Analyzing Complete Generalizability Theory Designs Using Structural Equation Models Applied Measurement in Education (IF 1.528) Pub Date : 2023-12-13 Walter P. Vispoel, Hyeri Hong, Hyeryung Lee, Terrence D. Jorgensen
We illustrate how to analyze complete generalizability theory (GT) designs using structural equation modeling software (lavaan in R), compare results to those obtained from numerous ANOVA-based pac...
-
Recruitment and Retention of Racially and Ethnically Minoritized Graduate Students in Educational Measurement Programs Applied Measurement in Education (IF 1.528) Pub Date : 2023-12-13 Jennifer Randall, Joseph Rios
Building on the extant literature on recruitment and retention within the field of STEM and undergraduate education, we sought to explore the recruitment and retention experiences of racially and e...
-
Bayesian Logistic Regression: A New Method to Calibrate Pretest Items in Multistage Adaptive Testing Applied Measurement in Education (IF 1.528) Pub Date : 2023-12-13 TsungHan Ho
An operational multistage adaptive test (MST) requires the development of a large item bank and the effort to continuously replenish the item bank due to concerns about test security and validity o...
-
Change in Engagement During Test Events: An Argument for Weighted Scoring? Applied Measurement in Education (IF 1.528) Pub Date : 2023-12-13 Steven L. Wise, G. Gage Kingsbury, Meredith L. Langi
Recent research has provided evidence that performance change during a student’s test event can indicate the presence of test-taking disengagement. Meaningful performance change implies that some p...
-
Validity: An Integrated Approach to Test Score Meaning and Use, by Gregory J. Cizek, New York, Routledge, 2020, 190 pp., 55.00 (Paperback) Applied Measurement in Education (IF 1.528) Pub Date : 2023-11-19 Tony Albano
Published in Applied Measurement in Education (Vol. 36, No. 4, 2023)
-
Detecting Item Parameter Drift in Small Sample Rasch Equating Applied Measurement in Education (IF 1.528) Pub Date : 2023-11-08 Daniel Jurich, Chunyan Liu
Screening items for parameter drift helps protect against serious validity threats and ensure score comparability when equating forms. Although many high-stakes credentialing examinations operate w...
-
The Promise of Assessments That Advance Social Justice: An Indigenous Example Applied Measurement in Education (IF 1.528) Pub Date : 2023-06-08 Pōhai Kūkea Shultz, Kerry Englert
ABSTRACT In the United States, systemic racism against people of color was brought to the forefront of discourse throughout 2020, and highlighted the on-going inequities faced by intentionally marginalized groups in policing, health and education. No community of color is immune from these inequities, and the activism in 2020 and the consequences of the pandemic have made systemic inequities impossible
-
The Standards Will Never Be Enough: A Racial Justice Extension Applied Measurement in Education (IF 1.528) Pub Date : 2023-05-31 Mya Poe, Maria Elena Oliveri, Norbert Elliot
ABSTRACT Since 1952, the Standards for Educational and Psychological Testing has provided criteria for developing and evaluating educational and psychological tests and testing practice. Yet, we argue that the foundations, operations, and applications in the Standards are no longer sufficient to meet the current U.S. testing demands for fairness for all test takers. We propose racial justice extensions
-
Shifting Educational Measurement from an Agent of Systemic Racism to an Anti-Racist Endeavor Applied Measurement in Education (IF 1.528) Pub Date : 2023-05-27 Michael Russell
ABSTRACT In recent years, issues of race, racism and social justice have garnered increased attention across the nation. Although some aspects of social justice, particularly cultural sensitivity and test bias, have received similar attention within the field of educational measurement, sharp focus of racism has alluded the field. This manuscript focuses narrowly on racism. Drawing on an expansive
-
Enacting a Process for Developing Culturally Relevant Classroom Assessments Applied Measurement in Education (IF 1.528) Pub Date : 2023-05-25 Eowyn P. O’Dwyer, Jesse R. Sparks, Leslie Nabors Oláh
ABSTRACT A critical aspect of the development of culturally relevant classroom assessments is the design of tasks that affirm students’ racial and ethnic identities and community cultural practices. This paper describes the process we followed to build a shared understanding of what culturally relevant assessments are, to pursue ways of bringing more diverse voices and perspectives into the development
-
Validity and Racial Justice in Educational Assessment Applied Measurement in Education (IF 1.528) Pub Date : 2023-05-20 Josh Lederman
Abstract Given its centrality to assessment, until the concept of validity includes concern for racial justice, such matters will be seen as residing outside the “real” work of validation, rendering them powerless to count against the apparent scientific merit of the test. As the definition of validity has evolved, however, it holds great potential to centralize matters like racial (in)justice, positioning
-
Applying a Culturally Responsive Pedagogical Framework to Design and Evaluate Classroom Performance-Based Assessments in Hawai‘i Applied Measurement in Education (IF 1.528) Pub Date : 2023-05-20 Carla M. Evans
ABSTRACT Previous writings focus on why centering assessment design around students’ cultural, social, and/or linguistic diversity is important and how performance-based assessment can support such aims. This article extends previous work by describing how a culturally responsive classroom assessment framework was created from a culturally responsive education (CRE) pedagogical framework. The goal
-
Measurement Invariance in Relation to First Language: An Evaluation of German Reading and Spelling Tests Applied Measurement in Education (IF 1.528) Pub Date : 2023-04-22 Linda Visser, Friederike Cartschau, Ariane von Goldammer, Janin Brandenburg, Marieke Timmerman, Marcus Hasselhorn, Claudia Mähler
ABSTRACT The growing number of children in primary schools in Germany who have German as their second language (L2) has raised questions about the fairness of performance assessment. Fair tests are a prerequisite for distinguishing between L2 learning delay and a specific learning disability. We evaluated five commonly used reading and spelling tests for measurement invariance (MI) as a function of
-
Keeping Up the PACE: Evaluating Grade 8 Student Achievement Outcomes for New Hampshire’s Innovative Assessment System Applied Measurement in Education (IF 1.528) Pub Date : 2023-04-22 Alexandra Lane Perez, Carla Evans
ABSTRACT New Hampshire’s Performance Assessment of Competency Education (PACE) innovative assessment system uses student scores from classroom performance assessments as well as other classroom tests for school accountability purposes. One concern is that not having annual state testing may incentivize schools and teachers away from teaching the breadth of the state content standards. This study examined
-
College Admissions and Testing in a Time of Transformational Change Applied Measurement in Education (IF 1.528) Pub Date : 2023-04-18 Ross Markle
Published in Applied Measurement in Education (Vol. 36, No. 2, 2023)
-
Multi-Group Generalizations of SIBTEST and Crossing-SIBTEST Applied Measurement in Education (IF 1.528) Pub Date : 2023-04-15 R. Philip Chalmers, Guoguo Zheng
ABSTRACT This article presents generalizations of SIBTEST and crossing-SIBTEST statistics for differential item functioning (DIF) investigations involving more than two groups. After reviewing the original two-group setup for these statistics, a set of multigroup generalizations that support contrast matrices for joint tests of DIF are presented. To investigate the Type I error and power behavior of
-
Comparing Drift Detection Methods for Accurate Rasch Equating in Different Sample Sizes Applied Measurement in Education (IF 1.528) Pub Date : 2023-04-11 Sarah Alahmadi, Andrew T. Jones, Carol L. Barry, Beatriz Ibáñez
ABSTRACT Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large sample sizes, varying the proportion of test
-
Tracking Ordinal Development of Skills with a Longitudinal DINA Model with Polytomous Attributes Applied Measurement in Education (IF 1.528) Pub Date : 2023-04-10 Peida Zhan, Yaohui Liu, Zhaohui Yu, Yanfang Pan
ABSTRACT Many educational and psychological studies have shown that the development of students is generally step-by-step (i.e. ordinal development) to a specific level. This study proposed a novel longitudinal learning diagnosis model with polytomous attributes to track students’ ordinal development in learning. Using the concept of polytomous attributes in the proposed model, the learning process
-
Dissecting Knowledge, Guessing, and Blunder in Multiple Choice Assessments Applied Measurement in Education (IF 1.528) Pub Date : 2023-02-21 Rashid M. Abu-Ghazalah, David N. Dubins, Gregory M.K. Poon
ABSTRACT Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly account for guessing, knowledge, and blunder using
-
A Census-Level, Multi-Grade Analysis of the Association Between Testing Time, Breaks, and Achievement Applied Measurement in Education (IF 1.528) Pub Date : 2023-02-16 David Rutkowski, Leslie Rutkowski, Dubravka Svetina Valdivia, Yusuf Canbolat, Stephanie Underhill
ABSTRACT Several states in the US have removed time limits on their state assessments. In Indiana, where this study takes place, the state assessment is both untimed during the testing window and allows unlimited breaks during the testing session. Using grade 3 and 8 math and English state assessment data, in this paper we focus on time used for testing and examine whether students who take more time
-
Using Bayesian Networks for Cognitive Assessment of Student Understanding of Buoyancy: A Granular Hierarchy Model Applied Measurement in Education (IF 1.528) Pub Date : 2023-02-11 Ling Ling Wang, Sun Xiao Jian, Yan Lou Liu, Tao Xin
ABSTRACT Cognitive diagnostic assessment based on Bayesian networks (BN) is developed in this paper to evaluate student understanding of the physical concept of buoyancy. we propose a three-order granular-hierarchy BN model which accounts for both fine-grained attributes and high-level proficiencies. Conditional independence in the BN structure is tested and utilized to validate the proposed model
-
Accuracy and Sensitivity of Coefficient Alpha and Its Alternatives with Unidimensional and Contaminated Scales Applied Measurement in Education (IF 1.528) Pub Date : 2023-02-08 Leifeng Xiao, Kit-Tai Hau
ABSTRACT We compared coefficient alpha with five alternatives (omega total, omega RT, omega h, GLB, and coefficient H) in two simulation studies. Results showed for unidimensional scales, (a) all indices except omega h performed similarly well for most conditions; (b) alpha is still good; (c) GLB and coefficient H overestimated reliability with small samples and short scales, and (d) sensitivity to
-
Are Large Admissions Test Coaching Effects Widespread? A Longitudinal Analysis of Admissions Test Scores Applied Measurement in Education (IF 1.528) Pub Date : 2023-02-07 Jeffrey A. Dahlke, Paul R. Sackett, Nathan R. Kuncel
ABSTRACT We examine longitudinal data from 120,384 students who took a version of the PSAT/SAT in the 9th, 10th, 11th, and 12th grades. We investigate score changes over time and show that socioeconomic status (SES) is related to the degree of score improvement. We note that the 9th and 10th grade PSAT are low-stakes tests, while the operational SAT is a high-stakes test. We posit that investments
-
Maintaining Score Scales Over Time: A Comparison of Five Scoring Methods Applied Measurement in Education (IF 1.528) Pub Date : 2023-02-01 Stella Yun Kim, Won-Chan Lee
ABSTRACT This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of linking with multiple test forms. Simulation
-
An Examination of Individual Ability Estimation and Classification Accuracy Under Rapid Guessing Misidentifications Applied Measurement in Education (IF 1.528) Pub Date : 2022-12-19 Joseph Rios
ABSTRACT To mitigate the deleterious effects of rapid guessing (RG) on ability estimates, several rescoring procedures have been proposed. Underlying many of these procedures is the assumption that RG is accurately identified. At present, there have been minimal investigations examining the utility of rescoring approaches when RG is misclassified, and individual scores are reported. To address this
-
Comparison of Methods for Identifying Differential Step Functioning with Polytomous Item Response Data Applied Measurement in Education (IF 1.528) Pub Date : 2022-12-13 Holmes Finch
ABSTRACT Much research has been devoted to identification of differential item functioning (DIF), which occurs when the item responses for individuals from two groups differ after they are conditioned on the latent trait being measured by the scale. There has been less work examining differential step functioning (DSF), which is present for polytomous items when the conditional likelihood of responses
-
Personality Aspects and the Underprediction of Women’s Academic Performance Applied Measurement in Education (IF 1.528) Pub Date : 2022-12-13 You Zhou, Paul R. Sackett, Thomas Brothen
ABSTRACT We sought to replicate prior findings that admissions tests’ underprediction of female college performance was driven in part by the omission of Big 5 personality factors from the predictive model, using 5,400 college students. We investigated gender differences in an elaborated model subdividing the Big 5 into ten aspects. We found differences at the aspect level that were not found at the
-
Performance Decline as an Indicator of Generalized Test-Taking Disengagement Applied Measurement in Education (IF 1.528) Pub Date : 2022-12-06 Steven L. Wise, G. Gage Kingsbury
ABSTRACT In achievement testing we assume that students will demonstrate their maximum performance as they encounter test items. Sometimes, however, student performance can decline during a test event, which implies that the test score does not represent maximum performance. This study describes a method for identifying significant performance decline and investigated its utility as an indicator of
-
Using Bayesian Networks to Characterize Student Performance across Multiple Assessments of Individual Standards Applied Measurement in Education (IF 1.528) Pub Date : 2022-08-18 Jiajun Xu, Nathan Dadey
ABSTRACT This paper explores how student performance across the full set of multiple modular assessments of individual standards, which we refer to as mini-assessments, from a large scale, operational program of interim assessment can be summarized using Bayesian networks. We follow a completely data-driven approach in which no constraints are imposed to best reflect the empirical relationships between
-
Not-reached Items: An Issue of Time and of test-taking Disengagement? the Case of PISA 2015 Reading Data Applied Measurement in Education (IF 1.528) Pub Date : 2022-08-11 Elodie Pools
ABSTRACT Many low-stakes assessments, such as international large-scale surveys, are administered during time-limited testing sessions and some test-takers are not able to endorse the last items of the test, resulting in not-reached (NR) items. However, because the test has no consequence for the respondents, these NR items can also stem from quitting the test. This article, by means of mixture modeling
-
When Should Individual Ability Estimates Be Reported if Rapid Guessing Is Present? Applied Measurement in Education (IF 1.528) Pub Date : 2022-07-26 Joseph A. Rios
ABSTRACT Testing programs are confronted with the decision of whether to report individual scores for examinees that have engaged in rapid guessing (RG). As noted by the Standards for Educational and Psychological Testing, this decision should be based on a documented criterion that determines score exclusion. To this end, a number of heuristic criteria (e.g., exclude all examinees with RG rates of
-
Response Demands of Reading Comprehension Test Items: A Review of Item Difficulty Modeling Studies Applied Measurement in Education (IF 1.528) Pub Date : 2022-07-22 Steve Ferrara, Jeffrey T. Steedle, Roger S. Frantz
ABSTRACT Item difficulty modeling studies involve (a) hypothesizing item features, or item response demands, that are likely to predict item difficulty with some degree of accuracy; and (b) entering the features as independent variables into a regression equation or other statistical model to predict difficulty. In this review, we report findings from 13 empirical item difficulty modeling studies of
-
Guiding Educators’ Evaluation of the Measurement Quality of Social and Emotional Learning (SEL) Assessments Applied Measurement in Education (IF 1.528) Pub Date : 2022-06-05 Jessica L. Jonson
ABSTRACT This article describes a grant project that generated a technical guide for PK-12 educators who are utilizing social and emotional learning (SEL) assessments for educational improvement purposes. The guide was developed over a two-year period with funding from the Spencer Foundation. The result was the collective contribution of a widely representative group of scholars and practitioners whose
-
Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation Applied Measurement in Education (IF 1.528) Pub Date : 2022-05-31 Yoon Ah Song, Won-Chan Lee
ABSTRACT This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of proficiency estimation of two IRT models (GPCM
-
Does the Response Options Placement Provide Clues to the Correct Answers in Multiple-choice Tests? A Systematic Review Applied Measurement in Education (IF 1.528) Pub Date : 2022-05-30 Séverin Lions, Carlos Monsalve, Pablo Dartnell, María Paz Blanco, Gabriel Ortega, Julie Lemarié
ABSTRACT Multiple-choice tests are widely used in education, often for high-stakes assessment purposes. Consequently, these tests should be constructed following the highest standards. Many efforts have been undertaken to advance item-writing guidelines intended to improve tests. One important issue is the unwanted effects of the options’ position on test outcomes. Any possible effects should be controlled
-
Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing Applied Measurement in Education (IF 1.528) Pub Date : 2022-05-10 Mohammed A. A. Abulela, Joseph A Rios
ABSTRACT When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the Mantel-Haenszel (MH), standardization index (STD)
-
Performance of Infit and Outfit Confidence Intervals Calculated via Parametric Bootstrapping Applied Measurement in Education (IF 1.528) Pub Date : 2022-04-25 John Alexander Silva Diaz, Carmen Köhler, Johannes Hartig
ABSTRACT Testing item fit is central in item response theory (IRT) modeling, since a good fit is necessary to draw valid inferences from estimated model parameters. Infit and outfit fit statistics, widespread indices for detecting deviations from the Rasch model, are affected by data factors, such as sample size. Consequently, the traditional use of fixed infit and outfit cutoff points is an ineffective
-
Personalized Online Learning, Test Fairness, and Educational Measurement: Considering Differential Content Exposure Prior to a High Stakes End of Course Exam Applied Measurement in Education (IF 1.528) Pub Date : 2022-04-06 Daniel Katz, Anne Corinne Huggins-Manley, Walter Leite
ABSTRACT According to the Standards for Educational and Psychological Testing (2014), one aspect of test fairness concerns examinees having comparable opportunities to learn prior to taking tests. Meanwhile, many researchers are developing platforms enhanced by artificial intelligence (AI) that can personalize curriculum to individual student needs. This leads to a larger overarching question: When
-
Efficient Assessment of Students’ Proportional Reasoning Applied Measurement in Education (IF 1.528) Pub Date : 2022-02-09 Michele Carney, Katie Paulding, Joe Champion
ABSTRACT Teachers need ways to efficiently assess students’ cognitive understanding. One promising approach involves easily adapted and administered item types that yield quantitative scores that can be interpreted in terms of whether or not students likely possess key understandings. This study illustrates an approach to analyzing response process validity evidence from item types for assessing two
-
Analyzing Student Response Processes to Evaluate Success on a Technology-Based Problem-Solving Task Applied Measurement in Education (IF 1.528) Pub Date : 2022-02-08 Yuting Han, Mark Wilson
ABSTRACT A technology-based problem-solving test can automatically capture all the actions of students when they complete tasks and save them as process data. Response sequences are the external manifestations of the latent intellectual activities of the students, and it contains rich information about students’ abilities and different problem-solving strategies. This study adopted the mixture Rasch
-
Determining Reliability of Daily Measures: An Illustration with Data on Teacher Stress Applied Measurement in Education (IF 1.528) Pub Date : 2022-02-06 Thijmen van Alphen, Suzanne Jak, Joost Jansen in de Wal, Jaap Schuitema, Thea Peetsma
ABSTRACT Intensive longitudinal data is increasingly used to study state-like processes such as changes in daily stress. Measures aimed at collecting such data require the same level of scrutiny regarding scale reliability as traditional questionnaires. The most prevalent methods used to assess reliability of intensive longitudinal measures are based on the generalizability theory or a multilevel factor
-
Teacher Assessment Literacy: Implications for Diagnostic Assessment Systems Applied Measurement in Education (IF 1.528) Pub Date : 2022-02-01 Amy K. Clark, Brooke Nash, Meagan Karvonen
ABSTRACT Assessments scored with diagnostic models are increasingly popular because they provide fine-grained information about student achievement. Because of differences in how diagnostic assessments are scored and how results are used, the information teachers must know to interpret and use results may differ from concepts traditionally included in assessment literacy trainings for assessments that
-
The Consideration of Admissions Testing at Colleges and Universities: A Perspective Applied Measurement in Education (IF 1.528) Pub Date : 2021-10-15 Kurt F. Geisinger
(2021). The Consideration of Admissions Testing at Colleges and Universities: A Perspective. Applied Measurement in Education: Vol. 34, No. 4, pp. 237-239.
-
Comparing School Reports and Empirical Estimates of Relative Reliance on Tests Vs Grades in College Admissions Applied Measurement in Education (IF 1.528) Pub Date : 2021-10-11 Paul R. Sackett, Melissa S. Sharpe, Nathan Kuncel
ABSTRACT The literature is replete with references to a disproportionate reliance on admission test scores (e.g., the ACT or SAT) in the college admissions process. School-reported reliance on test scores and grades has been used to study this question, generally indicating relatively equal reliance on the two, with a slightly higher endorsement of grades. As an alternative, we develop an empirical
-
A Method for Displaying Incremental Validity with Expectancy Charts Applied Measurement in Education (IF 1.528) Pub Date : 2021-10-12 Samuel David Lee, Philip T Walmsley, Paul R. Sackett, Nathan Kuncel
ABSTRACT Providing assessment validity information to decision makers in a clear and useful format is an ongoing challenge for the educational and psychological measurement community. We identify issues with a previous approach to a graphical presentation, noting that it is mislabeled as presenting incremental validity, when in fact it displays the effects of using predictors in a multiple hurdle fashion
-
Detecting Differential Item Functioning Using Cognitive Diagnosis Models: Applications of the Wald Test and Likelihood Ratio Test in a University Entrance Examination Applied Measurement in Education (IF 1.528) Pub Date : 2021-10-13 Roghayeh Mehrazmay, Behzad Ghonsooly, Jimmy de la Torre
ABSTRACT The present study aims to examine gender differential item functioning (DIF) in the reading comprehension section of a high stakes test using cognitive diagnosis models. Based on the multiple-group generalized deterministic, noisy “and” gate (MG G-DINA) model, the Wald test and likelihood ratio test are used to detect DIF. The flagged items are further inspected to find the attributes they
-
Between- versus Within-Examinee Variability in Test-Taking Effort and Test Emotions during a Low-Stakes Test Applied Measurement in Education (IF 1.528) Pub Date : 2021-10-10 Beth A. Perkins, Dena A Pastor, Sara J Finney
ABSTRACT When tests are low stakes for examinees, meaning there are little to no personal consequences associated with test results, some examinees put little effort into their performance. To understand the causes and consequences of diminished effort, researchers correlate test-taking effort with other variables, such as test-taking emotions and test performance. Most studies correlate examinees’
-
Characterizing the Latent Classes in a Mixture IRT Model Using DIF Applied Measurement in Education (IF 1.528) Pub Date : 2021-10-11 Tugba Karadavut
ABSTRACT Mixture IRT models address the heterogeneity in a population by extracting latent classes and allowing item parameters to vary between latent classes. Once the latent classes are extracted, they need to be further examined to be characterized. Some approaches have been adopted in the literature for this purpose. These approaches examine either the examinee or the item characteristics conceptually
-
Reconceptualizing Rapid Responses as a Speededness Indicator in High-Stakes Assessments Applied Measurement in Education (IF 1.528) Pub Date : 2021-10-06 Richard Feinberg, Daniel Jurich, Steven L Wise
ABSTRACT Previous research on rapid responding tends to implicitly consider examinees as either engaging in solution behavior or purely guessing. However, particularly in a high-stakes testing context, examinees perceiving that they are running out of time may consider the remaining items for less time than necessary to provide a fully informed response, but longer than a truly rapid guess. This partial
-
Detection of Outliers in Anchor Items Using Modified Rasch Fit Statistics Applied Measurement in Education (IF 1.528) Pub Date : 2021-10-27 Chunyan Liu, Daniel Jurich, Carol Morrison, Irina Grabovsky
ABSTRACT The existence of outliers in the anchor items can be detrimental to the estimation of examinee ability and undermine the validity of score interpretation across forms. However, in practice, anchor item performance can become distorted due to various reasons. This study compares the performance of modified INFIT and OUTFIT Rasch statistics with the Logit Difference approach with 0.3 and 0.5
-
Coefficient β As Extension of KR-21 Reliability for Summed and Scaled Scores for Polytomously-scored Tests Applied Measurement in Education (IF 1.528) Pub Date : 2021-09-04 Rashid S. Almehrizi
ABSTRACT KR-21 reliability and its extension (coefficient α) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article extends KR-21 to coefficient β to estimate reliability
-
Development and Use of Anchoring Vignettes: Psychometric Investigations and Recommendations for a Nonparametric Approach Applied Measurement in Education (IF 1.528) Pub Date : 2021-07-30 HyeSun Lee, Weldon Smith, Angel Martinez, Heather Ferris, Joe Bova
ABSTRACT The aim of the current research was to provide recommendations to facilitate the development and use of anchoring vignettes (AVs) for cross-cultural comparisons in education. Study 1 identified six factors leading to order violations and ties in AV responses based on cognitive interviews with 15-year-old students. The factors were categorized into three domains: varying levels of AV format
-
Bayesian Estimation and Testing of a Linear Logistic Test Model for Learning during the Test Applied Measurement in Education (IF 1.528) Pub Date : 2021-07-27 José H. Lozano, Javier Revuelta
ABSTRACT The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework compared to the traditional frequentist approach
-
The Effect of Peer Assessment on Non-Cognitive Outcomes: A Meta-Analysis Applied Measurement in Education (IF 1.528) Pub Date : 2021-07-08 Hongli Li, Jacquelyn A. Bialo, Yao Xiong, Charles Vincent Hunter, Xiuyan Guo
ABSTRACT Peer assessment is increasingly being used as a pedagogical tool in classrooms. Participating in peer assessment may enhance student learning in both cognitive and non-cognitive aspects. In this study, we focused on non-cognitive aspects by performing a meta-analysis to synthesize the effect of peer assessment on students’ non-cognitive learning outcomes. After a systematic search, we included