当前位置: X-MOL 学术Pract. Radiat. Oncol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Framework for Radiation Oncology Department-wide Evaluation and Implementation of Commercial AI Auto-contouring
Practical Radiation Oncology ( IF 3.3 ) Pub Date : 2023-11-05 , DOI: 10.1016/j.prro.2023.10.011
Dominic Maes 1 , Evan D H Gates 2 , Juergen Meyer 1 , John Kang 2 , Bao-Ngoc Thi Nguyen 3 , Myra Lavilla 3 , Dustin Melancon 2 , Emily S Weg 1 , Yolanda D Tseng 4 , Andrew Lim 5 , Stephen R Bowen 6
Affiliation  

Introduction

Artificial intelligence (AI) based auto-contouring in radiation oncology has potential benefits such as standardization and time savings. However, commercial AI solutions require careful evaluation prior to clinical integration. We developed a multidimensional evaluation method to test pre-trained AI-automated contouring solutions across a network of clinics.

Methods

Curated data included 121 patient planning CT (computed tomography) scans with a total of 859 clinically approved contours used for treatment from four clinics. Regions of interest (ROIs) were generated with three commercial AI-based automated contouring software solutions (AI1, AI2, AI3) spanning the following disease sites: brain, head-and-neck, thorax, abdomen, and pelvis. Quantitative agreement between AI-generated and clinical contours was measured by Dice similarity coefficient (DSC) and Hausdorff distance (HD). Qualitative assessment was performed by multiple experts scoring blinded AI-contours using a Likert scale. Workflow and usability surveying was also conducted.

Results

AI1/AI2/AI3 contours had high quantitative agreement in 27.8/32.8/34.1% of cases (DSC>0.9), performing well in pelvis (median DSC = 0.86/0.88/0.91) and thorax (median DSC = 0.91/0.89/0.91). All three solutions had low quantitative agreement in 7.4/8.8/6.1% of cases (DSC<0.5), performing worse in brain (median DSC=0.65/0.78/0.75) and H&N (median DSC=0.76/0.80/0.81). Qualitatively, AI1/AI2 contours were acceptable (rated 1-2) with at most minor edits in 70.7/74.6% of ROIs (2,906 ratings), higher for abdomen (AI1: 79.2%) and thorax (AI2: 90.2%), and lower for H&N (29.0/35.6%). An end-user survey showed strong user preference for full automation and mixed preferences for accuracy versus total number of structures generated.

Conclusion

Our evaluation method provided a comprehensive analysis of both quantitative and qualitative measures of commercially available pre-trained AI auto-contouring algorithms. The evaluation framework served as a roadmap for clinical integration that aligned with user workflow preference.



中文翻译:

放射肿瘤科全科评估和商业人工智能自动轮廓实施框架

介绍

放射肿瘤学中基于人工智能 (AI) 的自动轮廓具有标准化和节省时间等潜在优势。然而,商业人工智能解决方案在临床集成之前需要仔细评估。我们开发了一种多维评估方法来测试整个诊所网络中预先训练的人工智能自动轮廓解决方案。

方法

整理数据包括 121 名患者计划 CT(计算机断层扫描)扫描,以及来自 4 个诊所的总共 859 个临床批准的用于治疗的轮廓。感兴趣区域 (ROI) 是通过三种基于人工智能的商业自动轮廓软件解决方案(AI1、AI2、AI3)生成的,涵盖以下疾病部位:大脑、头颈、胸部、腹部和骨盆。AI 生成的轮廓与临床轮廓之间的定量一致性通过 Dice 相似系数 (DSC) 和 Hausdorff 距离 (HD) 来测量。定性评估由多位专家使用李克特量表对盲态人工智能轮廓进行评分。还进行了工作流程和可用性调查。

结果

AI1/AI2/AI3 轮廓在 27.8/32.8/34.1% 的病例中具有高度的定量一致性 (DSC>0.9),在骨盆(中位 DSC = 0.86/0.88/0.91)和胸部(中位 DSC = 0.91/0.89/0.91)中表现良好)。所有三种解决方案在 7.4/8.8/6.1% 的病例中具有较低的定量一致性(DSC<0.5),在大脑(中位 DSC=0.65/0.78/0.75)和 ​​H&N(中位 DSC=0.76/0.80/0.81)中表现较差。定性地讲,AI1/AI2 轮廓是可以接受的(评分 1-2),最多 70.7/74.6% 的 ROI(2,906 评分)进行细微编辑,腹部(AI1:79.2%)和胸部(AI2:90.2%)更高,并且H&N 较低 (29.0/35.6%)。一项最终用户调查显示,用户对完全自动化有强烈的偏好,而对准确性与生成的结构总数的偏好混合。

结论

我们的评估方法对商用预训练人工智能自动轮廓算法的定量和定性测量进行了全面分析。该评估框架作为临床集成的路线图,与用户工作流程偏好保持一致。

更新日期:2023-11-07
down
wechat
bug