当前位置: X-MOL 学术ACM Trans. Comput. Educ. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bob or Bot: Exploring ChatGPT's Answers to University Computer Science Assessment
ACM Transactions on Computing Education ( IF 2.4 ) Pub Date : 2024-01-14 , DOI: 10.1145/3633287
Kevin Waugh 1 , Mark Slaymaker 1 , Marian Petre 1 , John Woodthorpe 1 , Daniel Gooch 1
Affiliation  

Cheating has been a long-standing issue in university assessments. However, the release of ChatGPT and other free-to-use generative AI tools has provided a new and distinct method for cheating. Students can run many assessment questions through the tool and generate a superficially compelling answer, which may or may not be accurate. We ran a dual-anonymous “quality assurance” marking exercise across four end-of-module assessments across a distance university computer science (CS) curriculum. Each marker received five ChatGPT-generated scripts alongside 10 student scripts. A total of 90 scripts were marked; every ChatGPT-generated script for the undergraduate modules received at least a passing grade (>40%), with all of the introductory module CS1 scripts receiving a distinction (>85%). None of the ChatGPT-taught postgraduate scripts received a passing grade (>50%). We also present the results of interviewing the markers and of running our sample scripts through a GPT-2 detector and the TurnItIn AI detector, which both identified every ChatGPT-generated script but differed in the number of false positives. As such, we contribute a baseline understanding of how the public release of generative AI is likely to significantly impact quality assurance processes. Our analysis demonstrates that in most cases, across a range of question formats, topics, and study levels, ChatGPT is at least capable of producing adequate answers for undergraduate assessment.



中文翻译:

Bob 或 Bot:探索 ChatGPT 对大学计算机科学评估的答案

作弊一直是大学评估中长期存在的问题。然而,ChatGPT 和其他免费生成人工智能工具的发布提供了一种新的、独特的作弊方法。学生可以通过该工具运行许多评估问题,并生成表面上令人信服的答案,该答案可能准确,也可能不准确。我们在远程大学计算机科学 (CS) 课程的四项模块结束评估中进行了双匿名“质量保证”评分练习。每个评分者都会收到 5 个 ChatGPT 生成的脚本以及 10 个学生脚本。总共标记了90个脚本;ChatGPT 为本科生模块生成的每个脚本都至少获得了及格分数 (>40%),所有介绍性模块 CS1 脚本都获得了优异成绩 (>85%)。ChatGPT 教授的研究生脚本均未获得及格分数(>50%)。我们还展示了采访标记以及通过 GPT-2 检测器和 TurnItIn AI 检测器运行示例脚本的结果,它们都识别了每个 ChatGPT 生成的脚本,但误报数量不同。因此,我们对公开发布生成式人工智能可能如何显着影响质量保证流程提供了基本了解。我们的分析表明,在大多数情况下,在各种问题格式、主题和学习水平上,ChatGPT 至少能够为本科生评估提供足够的答案。

更新日期:2024-01-15
down
wechat
bug