当前位置: X-MOL 学术J. Crohns Colitis › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accuracy of Information given by ChatGPT for patients with Inflammatory Bowel Disease in relation to ECCO Guidelines
Journal of Crohn's and Colitis ( IF 8 ) Pub Date : 2024-03-23 , DOI: 10.1093/ecco-jcc/jjae040
Martina Sciberras 1 , Yvette Farrugia 1 , Hannah Gordon 2, 3 , Federica Furfaro 4 , Mariangela Allocca 4 , Joana Torres 5, 6, 7 , Naila Arebi 8, 9 , Gionata Fiorino 4, 10 , Marietta Iacucci 11 , Bram Verstockt 12, 13 , Fernando Magro 14 , Kostas Katsanos 15 , Josef Busuttil 16 , Katya De Giovanni 16 , Valerie Anne Fenech 1 , Stefania Chetcuti Zammit 1 , Pierre Ellul 1
Affiliation  

Introduction As acceptance of AI platforms increases, more patients will consider these tools as sources of information. The ChatGPT architecture utilizes a neural network to process natural language, thus generating responses based on the context of input text. The accuracy and completeness of ChatGPT3.5 in the context of Inflammatory Bowel Disease remains unclear. Methods In this prospective study, 38 questions worded by IBD patients were inputted into ChatGPT3.5. The following topics were covered: 1) CD, UC and malignancy, 2) maternal medicine 3) infection and vaccination 4) complementary medicine. Responses given by Chat GPT were assessed for accuracy (1 – completely incorrect to 5 – completely correct) and completeness (3-point Likert scale; range 1 – incomplete to 3 – complete) by 14 expert gastroenterologists, in comparison with relevant ECCO guidelines. Results In terms of accuracy, most replies (84.2%) had a median score of ≥4 (IQR:2) and a mean score of 3.87 (SD: +/- 0.6). For completeness, 34.2% of the replies had a median score of 3 and 55.3 % had a median score of between 2 and <3. Overall, the mean rating was 2.24 (SD: +/- 0.4, Median:2 IQR :1). Though group 3 and 4 had a higher mean for both accuracy and completeness, there was no significant scoring variation between the 4 question groups (Kruskal-Wallis test p:>0.05). However, statistical analysis for the different individual questions revealed a significant difference both for accuracy (p<0.001) and completeness (p<0.001). The questions which rated the highest for both accuracy and completeness were related to smoking, while the lowest rating was related to screening for malignancy and vaccinations especially in the context of immunosuppression and family planning. Conclusion This is the first study to demonstrate the capability of an AI-based system to provide accurate and comprehensive answers to real-world patient queries in IBD. AI systems may serve as a useful adjunct for patients, in addition to standard of care in clinic and validated patient information resources. However, responses in specialist areas may deviate from evidence-based guidance and the replies need to give more firm advice.

中文翻译:

ChatGPT 为炎症性肠病患者提供的与 ECCO 指南相关的信息的准确性

简介 随着人工智能平台接受度的提高,更多患者将把这些工具视为信息来源。 ChatGPT 架构利用神经网络来处理自然语言,从而根据输入文本的上下文生成响应。 ChatGPT3.5 在炎症性肠病背景下的准确性和完整性仍不清楚。方法在这项前瞻性研究中,将IBD患者提出的38个问题输入ChatGPT3.5。涵盖以下主题:1) CD、UC 和恶性肿瘤,2) 孕产妇医学,3) 感染和疫苗接种,4) 补充医学。由 14 名胃肠病专家评估 Chat GPT 给出的回答的准确性(1 – 完全错误到 5 – 完全正确)和完整性(3 点李克特量表;范围 1 – 不完整到 3 – 完整),并与相关 ECCO 指南进行比较。结果 在准确性方面,大多数回复(84.2%)的中位得分≥4(IQR:2),平均得分为3.87(SD:+/- 0.6)。就完整性而言,34.2%的答复的中位数分数为3,55.3%的答复的中位数分数在2和<3之间。总体而言,平均评分为 2.24(SD:+/- 0.4,中位数:2 IQR:1)。尽管第 3 组和第 4 组的准确性和完整性均值较高,但 4 个问题组之间没有显着的评分差异(Kruskal-Wallis 检验 p:>0.05)。然而,对不同个体问题的统计分析揭示了准确性(p<0.001)和完整性(p<0.001)方面的显着差异。准确性和完整性最高的问题与吸烟有关,而评级最低的问题与恶性肿瘤筛查和疫苗接种有关,特别是在免疫抑制和计划生育方面。结论 这是第一项证明基于人工智能的系统能够为 IBD 患者的真实问题提供准确、全面答案的能力的研究。除了临床护理标准和经过验证的患者信息资源之外,人工智能系统还可以作为患者的有用辅助工具。然而,专业领域的答复可能会偏离基于证据的指导,答复需要提供更坚定的建议。
更新日期:2024-03-23
down
wechat
bug