Evaluation of an LLM in Identifying Logical Fallacies: A Call for Rigor When Adopting LLMs in HCI Research,arXiv - CS - Human-Computer Interaction

当前位置： X-MOL 学术 › arXiv.cs.HC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Evaluation of an LLM in Identifying Logical Fallacies: A Call for Rigor When Adopting LLMs in HCI Research
arXiv - CS - Human-Computer Interaction Pub Date : 2024-04-08 , DOI: arxiv-2404.05213
Gionnieve Lim, Simon T. Perrault

There is increasing interest in the adoption of LLMs in HCI research. However, LLMs may often be regarded as a panacea because of their powerful capabilities with an accompanying oversight on whether they are suitable for their intended tasks. We contend that LLMs should be adopted in a critical manner following rigorous evaluation. Accordingly, we present the evaluation of an LLM in identifying logical fallacies that will form part of a digital misinformation intervention. By comparing to a labeled dataset, we found that GPT-4 achieves an accuracy of 0.79, and for our intended use case that excludes invalid or unidentified instances, an accuracy of 0.90. This gives us the confidence to proceed with the application of the LLM while keeping in mind the areas where it still falls short. The paper describes our evaluation approach, results and reflections on the use of the LLM for our intended task.

中文翻译：

法学硕士在识别逻辑谬误方面的评估：在人机交互研究中采用法学硕士时要求严谨

人们对在人机交互研究中采用法学硕士越来越感兴趣。然而，法学硕士通常可能被视为万能药，因为它们具有强大的能力，并伴随着对其是否适合其预期任务的监督。我们认为，应在严格评估后以批判性的方式采用法学硕士。因此，我们提出了法学硕士在识别逻辑谬误方面的评估，这些逻辑谬误将构成数字错误信息干预的一部分。通过与标记数据集进行比较，我们发现 GPT-4 的准确度为 0.79，对于排除无效或未识别实例的预期用例，准确度为 0.90。这使我们有信心继续申请法学硕士，同时牢记其仍存在不足的领域。本文描述了我们的评估方法、结果以及对使用法学硕士来完成我们的预期任务的反思。

更新日期：2024-04-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>