当前位置: X-MOL 学术Phys. Rev. Phys. Educ. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluating IBM’s Watson natural language processing artificial intelligence as a short-answer categorization tool for physics education research
Physical Review Physics Education Research ( IF 3.1 ) Pub Date : 2024-03-22 , DOI: 10.1103/physrevphyseducres.20.010116
Jennifer Campbell , Katie Ansell , Tim Stelzer

Recent advances in publicly available natural language processors (NLP) may enhance the efficiency of analyzing student short-answer responses in physics education research (PER). We train a state-of-the-art NLP, IBM’s Watson, and test its agreement with human coders using two different studies that gathered text responses in which students explain their reasoning on physics-related questions. The first study analyzes 479 student responses to a lab data analysis question and categorizes them by main idea. The second study analyzes 732 student answers to identify the presence or absence of each of the two conceptual themes. When training Watson with approximately one-third to half of the samples, we find that samples labeled with high confidence scores have similar accuracy to human agreement; yet for lower confidence scores, humans outperform the NLP’s labeling accuracy. In addition to studying Watson’s overall accuracy, we use this analysis to better understand factors that impact how Watson categorizes. Using the data from the categorization study, we find that Watson’s algorithm does not appear to be impacted by the disproportionate representation of categories in the training set, and we examine mislabeled statements to identify vocabulary and phrasing that may increase the rate of false positives. Based on this work, we find that, with careful consideration of the research study design and an awareness of the NLP’s limitations, Watson may present a useful tool for large-scale PER studies or classroom analysis tools.

中文翻译:

评估 IBM 的 Watson 自然语言处理人工智能作为物理教育研究的简答分类工具

公开可用的自然语言处理器(NLP)的最新进展可能会提高物理教育研究(PER)中分析学生简答题反应的效率。我们训练最先进的 NLP(IBM 的 Watson),并使用两项不同的研究来测试其与人类编码员的一致性,这些研究收集了学生在其中解释他们对物理相关问题的推理的文本响应。第一项研究分析了 479 名学生对实验室数据分析问题的回答,并按主要思想对它们进行分类。第二项研究分析了 732 名学生的答案,以确定这两个概念主题是否存在。当用大约三分之一到一半的样本训练 Watson 时,我们发现标记有高置信度分数的样本具有与人类一致性相似的准确性;然而,对于较低的置信度分数,人类的表现优于 NLP 的标记准确性。除了研究 Watson 的整体准确性之外,我们还使用此分析来更好地了解影响 Watson 分类方式的因素。使用分类研究的数据,我们发现 Watson 的算法似乎并未受到训练集中类别不成比例的表示的影响,并且我们检查错误标记的语句以识别可能增加误报率的词汇和措辞。基于这项工作,我们发现,通过仔细考虑研究设计并认识到 NLP 的局限性,Watson 可能会为大规模 PER 研究或课堂分析工具提供有用的工具。
更新日期:2024-03-22
down
wechat
bug