样式: 排序: IF: - GO 导出 标记为已读
-
Examining Gender Differences in TIMSS 2019 Using a Multiple‐Group Hierarchical Speed‐Accuracy‐Revisits Model Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-04-24 Dihao Leng, Ummugul Bezirhan, Lale Khorramdel, Bethany Fishbein, Matthias von Davier
This study capitalizes on response and process data from the computer‐based TIMSS 2019 Problem Solving and Inquiry tasks to investigate gender differences in test‐taking behaviors and their association with mathematics achievement at the eighth grade. Specifically, a recently proposed hierarchical speed‐accuracy‐revisits (SAR) model was adapted to multiple country‐by‐gender groups to examine the extent
-
Guesses and Slips as Proficiency‐Related Phenomena and Impacts on Parameter Invariance Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-04-08 Xiangyi Liao, Daniel M Bolt
Traditional approaches to the modeling of multiple‐choice item response data (e.g., 3PL, 4PL models) emphasize slips and guesses as random events. In this paper, an item response model is presented that characterizes both disjunctively interacting guessing and conjunctively interacting slipping processes as proficiency‐related phenomena. We show how evidence for this perspective is seen in the systematic
-
Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-04-04 Jiangang Hao, Alina A. von Davier, Victoria Yaneva, Susan Lottridge, Matthias von Davier, Deborah J. Harris
The remarkable strides in artificial intelligence (AI), exemplified by ChatGPT, have unveiled a wealth of opportunities and challenges in assessment. Applying cutting‐edge large language models (LLMs) and generative AI to assessment holds great promise in boosting efficiency, mitigating bias, and facilitating customized evaluations. Conversely, these innovations raise significant concerns regarding
-
Revisiting the Usage of Alpha in Scale Evaluation: Effects of Scale Length and Sample Size Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-03-20 Leifeng Xiao, Kit‐Tai Hau, Melissa Dan Wang
Short scales are time‐efficient for participants and cost‐effective in research. However, researchers often mistakenly expect short scales to have the same reliability as long ones without considering the effect of scale length. We argue that applying a universal benchmark for alpha is problematic as the impact of low‐quality items is greater on shorter scales. In this study, we proposed simple guidelines
-
What Mathematics Content Do Teachers Teach? Optimizing Measurement of Opportunities to Learn in the Classroom Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-03-07 Jiahui Zhang, William H. Schmidt
Measuring opportunities to learn (OTL) is crucial for evaluating education quality and equity, but obtaining accurate and comprehensive OTL data at a large scale remains challenging. We attempt to address this issue by investigating measurement concerns in data collection and sampling. With the primary goal of estimating group‐level OTLs for large populations of classrooms and the secondary goal of
-
Reframing Research and Assessment Practices: Advancing an Antiracist and Anti‐Ableist Research Agenda Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-03-04 Angela Johnson, Elizabeth Barker, Marcos Viveros Cespedes
Educators and researchers strive to build policies and practices on data and evidence, especially on academic achievement scores. When assessment scores are inaccurate for specific student populations or when scores are inappropriately used, even data‐driven decisions will be misinformed. To maximize the impact of the research‐practice‐policy collaborative, every stage of the assessment and research
-
Digital Module 35: Through‐Year Assessment Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-02-28 Nathan Dadey, Brian Gong, Yun‐Kyung Kim, Edynn Sato
Module AbstractThrough‐year assessments are assessments that are administered in multiple parts and at different times over the course of a school year that also produce summative scores that can be used with state accountability systems (Lorié et al., 2021; Dadey & Gong, 2023). These assessments are alternatively known as instructionally embedded, through‐course, or periodic assessments. There are
-
-
On the Cover: High School Coursetaking Sequence Clusters and Postsecondary Enrollment Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-02-28 Yuan‐Ling Liaw
-
Editorial Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-02-28 Zhongmin Cui
-
Issue Information Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-02-28
CONTENTS 3 Editorial Zhongmin Cui Data Visualization 4 On the Cover: High School Coursetaking Sequence Clusters and Postsecondary Enrollment Yuan-Ling Liaw Special Section: Leveraging Measurement for Better Decisions 5 Using OpenAI GPT to Generate Reading Comprehension Items Ayfer Sayin and Mark Gierl 19 MxML (Exploring the Relationship Between Measurement and Machine Learning): Current State of the
-
ITEMS Corner Update: Two Years of Changes to ITEMS Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-02-28 Brian C. Leventhal
This issue marks the beginning of the final year of my tenure as editor of the Instructional Topics of Educational Measurement Series (ITEMS). Although I will save a comprehensive reflection until the last issue of the year, I will use this issue to provide an update on the two changes to ITEMS that were made over the past two years in addition to introducing the newest entry to the ITEMS digital library
-
A Workflow for Minimizing Errors in Template-Based Automated Item-Generation Development Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-02-12 Yanyan Fu
The template-based automated item-generation (TAIG) approach that involves template creation, item generation, item selection, field-testing, and evaluation has more steps than the traditional item development method. Consequentially, there is more margin for error in this process, and any template errors can be cascaded to the generated items. Therefore, it is essential to eliminate the source of
-
The University of California Was Wrong to Abolish the SAT: Admissions When Affirmative Action Was Banned Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-02-09 Donald Wittman
I study student characteristics and academic performance at the University of California, where consideration of an applicant's ethnicity has been banned since 1996 and SAT scores were used in admitting students to the university until fall 2021. I show the following: (1) SAT scores were more important than high school grades in predicting first-year university GPA; (2) the use of SAT scores alone
-
An Automated Item Pool Assembly Framework for Maximizing Item Utilization for CAT Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-02-09 Hwanggyu Lim, Kyung (Chris) T. Han
Computerized adaptive testing (CAT) has gained deserved popularity in the administration of educational and professional assessments, but continues to face test security challenges. To ensure sustained quality assurance and testing integrity, it is imperative to establish and maintain multiple stable item pools that are consistent in terms of psychometric characteristics and content specifications
-
MxML (Exploring the Relationship between Measurement and Machine Learning): Current State of the Field Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-01-29 Yi Zheng, Steven Nydick, Sijia Huang, Susu Zhang
The recent surge of machine learning (ML) has impacted many disciplines, including educational and psychological measurement (hereafter shortened as measurement). The measurement literature has seen rapid growth in applications of ML to solve measurement problems. However, as we emphasize in this article, it is imperative to critically examine the potential risks associated with involving ML in measurement
-
Measuring Variability in Proctor Decision Making on High-Stakes Assessments: Improving Test Security in the Digital Age Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-01-24 William Belzak, J. R. Lockwood, Yigal Attali
Remote proctoring, or monitoring test takers through internet-based, video-recording software, has become critical for maintaining test security on high-stakes assessments. The main role of remote proctors is to make judgments about test takers' behaviors and decide whether these behaviors constitute rule violations. Variability in proctor decision making, or the degree to which humans/proctors make
-
Knowledge Integration in Science Learning: Tracking Students' Knowledge Development and Skill Acquisition with Cognitive Diagnosis Models Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-01-24 Xin Xu, Shixiu Ren, Danhui Zhang, Tao Xin
In scientific literacy, knowledge integration (KI) is a scaffolding-based theory to assist students' scientific inquiry learning. To drive students to be self-directed, many courses have been developed based on KI framework. However, few efforts have been made to evaluate the outcome of students' learning under KI instruction. Moreover, finer-grained information has been pursued to better understand
-
Using OpenAI GPT to Generate Reading Comprehension Items Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-01-24 Ayfer Sayin, Mark Gierl
The purpose of this study is to introduce and evaluate a method for generating reading comprehension items using template-based automatic item generation. To begin, we describe a new model for generating reading comprehension items called the text analysis cognitive model assessing inferential skills across different reading passages. Next, the text analysis cognitive model is used to generate reading
-
Achievement and Growth on English Language Proficiency and Content Assessments for English Learners in Elementary Grades Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2024-01-10 Heather M Buzick, Mikyung Kim Wolf, Laura Ballard
English language proficiency (ELP) assessment scores are used by states to make high-stakes decisions related to linguistic support in instruction and assessment for English learner (EL) students and for EL student reclassification. Changes to both academic content standards and ELP academic standards within the last decade have resulted in increased academic rigor and language demands. In this study
-
Issue Information Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-12-05
CONTENTS 3 Editorial Zhongmin Cui Data Visualization 4 On the Cover: Tell-Tale Triangles of Subscore Value Yuan-Ling Liaw 5 The 2024 EM:IP Cover Graphic/Data Visualization Competition Articles 6 Item Selection Algorithm Based on Collaborative Filtering for Item Exposure Control Yiqin Pan, Oren Livne, James A. Wollack, and Sandip Sinharay 19 Measurement Efficiency for Technology-Enhanced and Multiple-Choice
-
ITEMS Corner Update: The Final Three Steps in the Development Process Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-12-05 Brian C. Leventhal
Throughout 2023, I have detailed each step of the module development process for the Instructional Topics in Educational Measurement Series (ITEMS). In the first issue, I outlined the 10 steps necessary to complete a module. In the second issue, I detailed Steps 1–3, which cover outlining the content, developing the content in premade PowerPoint templates, and having the slides reviewed by the editor
-
Digital Module 34: Introduction to Multilevel Measurement Modeling Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-12-05 Mairead Shaw, Jessica K. Flake
Clustered data structures are common in many areas of educational and psychological research (e.g., students clustered in schools, patients clustered by clinician). In the course of conducting research, questions are often administered to obtain scores reflecting latent constructs. Multilevel measurement models (MLMMs) allow for modeling measurement (the relationship of test items to constructs) and
-
Comparing Large-Scale Assessments in Two Proctoring Modalities with Interactive Log Data Analysis Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-11-02 Jinnie Shin, Qi Guo, Maxim Morin
With the increased restrictions on physical distancing due to the COVID-19 pandemic, remote proctoring has emerged as an alternative to traditional onsite proctoring to ensure the continuity of essential assessments, such as computer-based medical licensing exams. Recent literature has highlighted the significant impact of different proctoring modalities on examinees’ test experience, including factors
-
Foundational Competencies in Educational Measurement Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-10-17 Terry A. Ackerman, Deborah L. Bandalos, Derek C. Briggs, Howard T. Everson, Andrew D. Ho, Susan M. Lottridge, Matthew J. Madison, Sandip Sinharay, Michael C. Rodriguez, Michael Russell, Alina A. von Davier, Stefanie A. Wind
This article presents the consensus of an National Council on Measurement in Education Presidential Task Force on Foundational Competencies in Educational Measurement. Foundational competencies are those that support future development of additional professional and disciplinary competencies. The authors develop a framework for foundational competencies in educational measurement, illustrate how educational
-
Issue Information Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-09-06
CONTENTS 3 Editorial Zhongmin Cui Data Visualization 4 Reached or Not Reached: A Tale of Two Data Sources Yuan-Ling Liaw Articles 5 Applying a Mixture Rasch Model-Based Approach to Standard Setting Michael R. Peabody, Timothy J. Muckle, and Yu Meng 13 Do Subject Matter Experts’ Judgments of Multiple-Choice Format Suitability Predict Item Quality? Rebecca F. Berenbon and Bridget C. McHugh 22 Defining
-
ITEMS Corner Update: Recording Audio and Adding an Editorial Polish to an ITEMS Module Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-09-06 Brian C. Leventhal
In the first issue of Educational Measurement: Issues and Practice (EM:IP) in 2023, I outlined the 10 steps to the Instructional Topics in Educational Measurement Series (ITEMS) module development process. I then detailed the first three steps in the second issue, and in this issue, I discuss Steps 4–7, focusing on the audio recording process, editorial polish, interactive activities, and learning
-
Digital Module 33: Fairness in Classroom Assessment: Dimensions and Tensions Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-09-06 Amirhossein Rasooli
Perceptions of fairness are fundamental in building cooperation and trust, undermining conflicts, and gaining legitimacy in teacher-student relationships in classroom assessment. However, perceptions of unfairness in assessment can undermine students’ mental well-being, increase antisocial behaviors, increase psychological disengagement with learning, and threaten the belief in a fair society, fundamental
-
Item Selection Algorithm Based on Collaborative Filtering for Item Exposure Control Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-08-29 Yiqin Pan, Oren Livne, James A. Wollack, Sandip Sinharay
In computerized adaptive testing, overexposure of items in the bank is a serious problem and might result in item compromise. We develop an item selection algorithm that utilizes the entire bank well and reduces the overexposure of items. The algorithm is based on collaborative filtering and selects an item in two stages. In the first stage, a set of candidate items whose expected performance matches
-
Measurement Efficiency for Technology-Enhanced and Multiple-Choice Items in a K–12 Mathematics Accountability Assessment Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-08-25 Ozge Ersan, Yufeng Berry
The increasing use of computerization in the testing industry and the need for items potentially measuring higher-order skills have led educational measurement communities to develop technology-enhanced (TE) items and conduct validity studies on the use of TE items. Parallel to this goal, the purpose of this study was to collect validity evidence comparing item information functions, expected information
-
Weighing the Value of Complex Growth Estimation Methods to Evaluate Individual Student Response to Instruction Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-08-24 Ethan R. Van Norman
Sophisticated analytic strategies have been proposed as viable methods to improve the quantification of student improvement and to assist educators in making treatment decisions. The performance of three categories of latent growth modeling techniques (linear, quadratic, and dual change) to capture growth in oral reading fluency in response to a 12-week structured supplemental reading intervention
-
Does It Matter How the Rigor of High School Coursework Is Measured? Gaps in Coursework Among Students and Across Grades Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-08-23 Burhan Ogut, Darrick Yee, Ruhan Circi, Nevin Dizdari
Research shows that the intensity of high school course-taking is related to postsecondary outcomes. However, there are various approaches to measuring the intensity of students’ course-taking. This study presents new measures of coursework intensity that rely on differing levels of quantity and quality of coursework. We used these new indices to provide a current description of variations in high
-
Exploration of Latent Structure in Test Revision and Review Log Data Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-08-14 Susu Zhang, Anqi Li, Shiyu Wang
In computer-based tests allowing revision and reviews, examinees' sequence of visits and answer changes to questions can be recorded. The variable-length revision log data introduce new complexities to the collected data but, at the same time, provide additional information on examinees' test-taking behavior, which can inform test development and instructions. In the current study, we used recently
-
Applying a Mixture Rasch Model-Based Approach to Standard Setting Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-07-17 Michael R. Peabody, Timothy J. Muckle, Yu Meng
The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional standard-setting methods. We found that heterogeneity of the sample
-
Do Subject Matter Experts’ Judgments of Multiple-Choice Format Suitability Predict Item Quality? Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-07-11 Rebecca F. Berenbon, Bridget C. McHugh
To assemble a high-quality test, psychometricians rely on subject matter experts (SMEs) to write high-quality items. However, SMEs are not typically given the opportunity to provide input on which content standards are most suitable for multiple-choice questions (MCQs). In the present study, we explored the relationship between perceived MCQ suitability for a given content standard and the associated
-
Defining Test-Score Interpretation, Use, and Claims: Delphi Study for the Validity Argument Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-06-27 Timothy D. Folger, Jonathan Bostic, Erin E. Krupa
Validity is a fundamental consideration of test development and test evaluation. The purpose of this study is to define and reify three key aspects of validity and validation, namely test-score interpretation, test-score use, and the claims supporting interpretation and use. This study employed a Delphi methodology to explore how experts in validity and validation conceptualize test-score interpretation
-
Hierarchical Agglomerative Clustering to Detect Test Collusion on Computer-Based Tests Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-06-19 Soo Jeong Ingrisone, James N. Ingrisone
There has been a growing interest in approaches based on machine learning (ML) for detecting test collusion as an alternative to the traditional methods. Clustering analysis under an unsupervised learning technique appears especially promising to detect group collusion. In this study, the effectiveness of hierarchical agglomerative clustering (HAC) for detecting aberrant test takers on Computer-Based
-
A Probabilistic Filtering Approach to Non-Effortful Responding Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-06-16 Esther Ulitzsch, Benjamin W. Domingue, Radhika Kapoor, Klint Kanopka, Joseph A. Rios
Common response-time-based approaches for non-effortful response behavior (NRB) in educational achievement tests filter responses that are associated with response times below some threshold. These approaches are, however, limited in that they require a binary decision on whether a response is classified as stemming from NRB; thus ignoring potential classification uncertainty in resulting parameter
-
Issue Information Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-06-09
CONTENTS 3 Editorial Zhongmin Cui Data Visualization 4 Visualizing Distributions Across Grades Yuan-Ling Liaw Articles 5 Personalizing Large-Scale Assessment in Practice Heather M. Buzick, Jodi M. Casabianca, and Melissa L. Gholson 12 Validation as Evaluating Desired and Undesired Effects: Insights From Cross-Classified Mixed Effects Model Xuejun Ryan Ji and Amery D. Wu 21 Diving Into Students’ Transcripts:
-
ITEMS Corner Update: The Initial Steps in the ITEMS Development Process Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-06-09 Brian C. Leventhal
In the previous issue of Educational Measurement: Issues and Practice (EM:IP) I outlined the ten steps to authoring and producing a digital module for the Instructional Topics in Educational Measurement Series (ITEMS). In the current piece, I detail the first three steps: Step 1—Content Outline; Step 2—Content Development; and Step 3—Draft Review. After in-depth discussion of these three steps, I introduce
-
Digital Module 32: Understanding and Mitigating the Impact of Low Effort on Common Uses of Test and Survey Scores Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-06-09 James Soland
Most individuals who take, interpret, design, or score tests are aware that examinees do not always provide full effort when responding to items. However, many such individuals are not aware of how pervasive the issue is, what its consequences are, and how to address it. In this digital ITEMS module, Dr. James Soland will help fill these gaps in the knowledge base. Specifically, the module enumerates
-
The Role of Response Style Adjustments in Cross-Country Comparisons—A Case Study Using Data from the PISA 2015 Questionnaire Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-05-01 Esther Ulitzsch, Oliver Lüdtke, Alexander Robitzsch
Country differences in response styles (RS) may jeopardize cross-country comparability of Likert-type scales. When adjusting for rather than investigating RS is the primary goal, it seems advantageous to impose minimal assumptions on RS structures and leverage information from multiple scales for RS measurement. Using PISA 2015 background questionnaire data, we investigate such an adjustment procedure
-
Diving Into Students’ Transcripts: High School Course-Taking Sequences and Postsecondary Enrollment Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-04-23 Burhan Ogut, Ruhan Circi
The purpose of this study was to explore high school course-taking sequences and their relationship to college enrollment. Specifically, we implemented sequence analysis to discover common course-taking trajectories in math, science, and English language arts using high school transcript data from a recent nationally representative survey. Through sequence clustering, we reduced the complexity of the
-
Validation as Evaluating Desired and Undesired Effects: Insights From Cross-Classified Mixed Effects Model Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-04-05 Xuejun Ryan Ji, Amery D. Wu
The Cross-Classified Mixed Effects Model (CCMEM) has been demonstrated to be a flexible framework for evaluating reliability by measurement specialists. Reliability can be estimated based on the variance components of the test scores. Built upon their accomplishment, this study extends the CCMEM to be used for evaluating validity evidence. Validity is viewed as the coherence among the elements of a
-
Issue Information Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-03-26
CONTENTS 5 Editorial Zhongmin Cui Data Visualization 6 On the Cover: Key Specifications for a Large-Scale Medical Exam Yuan-Ling Liaw Announcement 7 Call for Papers: Leveraging Measurement for Better Decisions Special Section: Issues and Practice in Applying Machine Learning in Educational Measurement 8 Introduction to the Special Section “Issues and Practice in Applying Machine Learning in Educational
-
ITEMS Corner Update: The New ITEMS Module Development Process Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-03-26 Brian C. Leventhal
This issue marks 1 year into my tenure as editor of Instructional Topics in Educational Measurement Series (ITEMS). I will summarize and reflect on the achievements from the past year, outline the new ITEMS module production process, and introduce the new module published in this issue of Educational Measurement: Issues and Practice (EM:IP). Over the past year, there have been three new modules published:
-
Digital Module 31: Testing Accommodations for Students with Disabilities Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-03-26 Benjamin J. Lovett
Students with disabilities often take tests under different conditions than their peers do. Testing accommodations, which involve changes to test administration that maintain test content, include extending time limits, presenting written text through auditory means, and taking a test in a private room with fewer distractions. For some students with disabilities, accommodations such as these are necessary
-
Personalizing Large-Scale Assessment in Practice Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-03-26 Heather M. Buzick, Jodi M. Casabianca, Melissa L. Gholson
The article describes practical suggestions for measurement researchers and psychometricians to respond to calls for social responsibility in assessment. The underlying assumption is that personalizing large-scale assessment improves the chances that assessment and the use of test scores will contribute to equity in education. This article describes a spectrum of standardization and personalization
-
Bilevel Topic Model-Based Multitask Learning for Constructed-Responses Multidimensional Automated Scoring and Interpretation Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-03-15 Jiawei Xiong, Feiming Li
Multidimensional scoring evaluates each constructed-response answer from more than one rating dimension and/or trait such as lexicon, organization, and supporting ideas instead of only one holistic score, to help students distinguish between various dimensions of writing quality. In this work, we present a bilevel learning model for combining two objectives, the multidimensional automated scoring,
-
A Machine Learning Approach for the Simultaneous Detection of Preknowledge in Examinees and Items When Both Are Unknown Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-02-24 Yiqin Pan, James A. Wollack
Pan and Wollack (PW) proposed a machine learning method to detect compromised items. We extend the work of PW to an approach detecting compromised items and examinees with item preknowledge simultaneously and draw on ideas in ensemble learning to relax several limitations in the work of PW. The suggested approach also provides a confidence score, which is based on an autoencoder to represent our confidence
-
Cheating Detection of Test Collusion: A Study on Machine Learning Techniques and Feature Representation Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-02-19 Shun-Chuan Chang, Keng Lun Chang
Machine learning has evolved and expanded as an interdisciplinary research method for educational sciences. However, cheating detection of test collusion among multiple examinees or sets of examinees with unusual answer patterns using machine learning techniques has remained relatively unexplored. This study investigates collusion on multiple-choice tests by introducing feature representation methodologies
-
To Score or Not to Score: Factors Influencing Performance and Feasibility of Automatic Content Scoring of Text Responses Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-02-14 Torsten Zesch, Andrea Horbach, Fabian Zehner
In this article, we systematize the factors influencing performance and feasibility of automatic content scoring methods for short text responses. We argue that performance (i.e., how well an automatic system agrees with human judgments) mainly depends on the linguistic variance seen in the responses and that this variance is indirectly influenced by other factors such as target population or input
-
Machine Learning Literacy for Measurement Professionals: A Practical Tutorial Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-02-06 Rui Nie, Qi Guo, Maxim Morin
The COVID-19 pandemic has accelerated the digitalization of assessment, creating new challenges for measurement professionals, including big data management, test security, and analyzing new validity evidence. In response to these challenges, Machine Learning (ML) emerges as an increasingly important skill in the toolbox of measurement professionals in this new era. However, most ML tutorials are technical
-
Causal Inference and COVID: Contrasting Methods for Evaluating Pandemic Impacts Using State Assessments Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-02-03 Benjamin R. Shear
In the spring of 2021, just 1 year after schools were forced to close for COVID-19, state assessments were administered at great expense to provide data about impacts of the pandemic on student learning and to help target resources where they were most needed. Using state assessment data from Colorado, this article describes the biggest threats to making valid inferences about student learning to study
-
Machine Learning–Based Profiling in Test Cheating Detection Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-01-31 Huijuan Meng, Ye Ma
In recent years, machine learning (ML) techniques have received more attention in detecting aberrant test-taking behaviors due to advantages when compared to traditional data forensics methods. However, defining “True Test Cheaters” is challenging—different than other fraud detection tasks such as flagging forged bank checks or credit card frauds, testing organizations are often lack of physical evidences
-
Psychometric Evaluation of the Preschool Early Numeracy Skills Test–Brief Version Within the Item Response Theory Framework Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2023-01-11 Nikolaos Tsigilis, Katerina Krousorati, Athanasios Gregoriadis, Vasilis Grammatikopoulos
The Preschool Early Numeracy Skills Test–Brief Version (PENS-B) is a measure of early numeracy skills, developed and mainly used in the United States. The purpose of this study was to examine the factorial validity and measurement invariance across gender of PENS-B in the Greek educational context. PENS-B was administered to 906 preschool children (473 boys, 433 girls), randomly selected from 84 kindergarten
-
Using Active Learning Methods to Strategically Select Essays for Automated Scoring Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2022-12-30 Tahereh Firoozi, Hamid Mohammadi, Mark J. Gierl
Research on Automated Essay Scoring has become increasing important because it serves as a method for evaluating students’ written responses at scale. Scalable methods for scoring written responses are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written-response assessments. The purpose of this study is to describe and evaluate three
-
Issue Information Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2022-12-05
CONTENTS 3 Editorial Zhongmin Cui Data Visualization 4 On the Cover: Distractor Cascade Analysis Yuan-Ling Liaw 5 The 2023 EM:IP Cover Graphic/Data Visualization Competition Yuan-Ling Liaw In Memoriam 6 Ronald K. Hambleton (1943–2022): Setting the Standard for Measurement Excellence Stephen G. Sireci Articles 10 An Evaluation of Automatic Item Generation: A Case Study of Weak Theory Approach Yanyan
-
On the Cover: Distractor Cascade Analysis Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2022-12-05 Yuan-Ling Liaw
-
ITEMS Corner Update: High Traffic to the ITEMS Portal on the NCME Website Educational Measurement: Issues and Practice (IF 1.402) Pub Date : 2022-12-05 Brian C. Leventhal
As announced in the previous issue of Educational Measurement: Issues and Practice, the ITEMS portal is now hosted on the NCME website. This shift has many benefits. The modules are now easier to access for the NCME membership. Members can navigate to the portal via the link under the resources tab found on the ribbon at the top of each page on the website. Rather than having to go to an external site