当前位置: X-MOL 学术Adv. Health Sci. Educ. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards a more nuanced conceptualisation of differential examiner stringency in OSCEs.
Advances in Health Sciences Education ( IF 4 ) Pub Date : 2023-10-16 , DOI: 10.1007/s10459-023-10289-w
Matt Homer 1
Affiliation  

Quantitative measures of systematic differences in OSCE scoring across examiners (often termed examiner stringency) can threaten the validity of examination outcomes. Such effects are usually conceptualised and operationalised based solely on checklist/domain scores in a station, and global grades are not often used in this type of analysis. In this work, a large candidate-level exam dataset is analysed to develop a more sophisticated understanding of examiner stringency. Station scores are modelled based on global grades-with each candidate, station and examiner allowed to vary in their ability/stringency/difficulty in the modelling. In addition, examiners are also allowed to vary in how they discriminate across grades-to our knowledge, this is the first time this has been investigated. Results show that examiners contribute strongly to variance in scoring in two distinct ways-via the traditional conception of score stringency (34% of score variance), but also in how they discriminate in scoring across grades (7%). As one might expect, candidate and station account only for a small amount of score variance at the station-level once candidate grades are accounted for (3% and 2% respectively) with the remainder being residual (54%). Investigation of impacts on station-level candidate pass/fail decisions suggest that examiner differential stringency effects combine to give false positive (candidates passing in error) and false negative (failing in error) rates in stations of around 5% each but at the exam-level this reduces to 0.4% and 3.3% respectively. This work adds to our understanding of examiner behaviour by demonstrating that examiners can vary in qualitatively different ways in their judgments. For institutions, it emphasises the key message that it is important to sample widely from the examiner pool via sufficient stations to ensure OSCE-level decisions are sufficiently defensible. It also suggests that examiner training should include discussion of global grading, and the combined effect of scoring and grading on candidate outcomes.

中文翻译:

对 OSCE 中不同的审查员严格程度进行更细致的概念化。

对各个审查员之间 OSCE 评分的系统差异进行定量测量(通常称为审查员严格性)可能会威胁到审查结果的有效性。此类效应通常仅基于站点中的清单/域分数来概念化和操作化,并且在此类分析中不经常使用全局等级。在这项工作中,分析了大型考生级别的考试数据集,以更深入地了解考官的严格程度。考点分数是根据全球成绩进行建模的,每个考生、考点和考官在建模中的能力/严格程度/难度都可以有所不同。此外,考官还可以在不同年级的歧视方式上有所不同——据我们所知,这是第一次对此进行调查。结果表明,考官以两种不同的方式对评分差异产生了很大的影响——通过传统的评分严格性概念(评分差异的 34%),以及他们如何区分不同年级的评分(7%)。正如人们所预料的那样,一旦考虑到考生成绩(分别为 3% 和 2%),考生和电台在电台级别上仅产生少量分数差异,其余部分为残差(54%)。对站级考生通过/未通过决定的影响的调查表明,考官的严格程度差异效应结合在一起,导致各站的假阳性(考生错误通过)和假阴性(错误失败)率各约为 5%,但在考试中。水平分别降低至 0.4% 和 3.3%。这项工作通过证明审查员的判断可能存在质的不同方式,增加了我们对审查员行为的理解。对于机构而言,它强调了一个关键信息,即通过足够的站点从审查员库中进行广泛抽样非常重要,以确保欧安组织层面的决定具有足够的辩护性。它还建议考官培训应包括对总体评分的讨论,以及评分和评分对考生结果的综合影响。
更新日期:2023-10-16
down
wechat
bug