当前位置: X-MOL 学术Int. J. CARS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Surgical phase and instrument recognition: how to identify appropriate dataset splits
International Journal of Computer Assisted Radiology and Surgery ( IF 3 ) Pub Date : 2024-01-29 , DOI: 10.1007/s11548-024-03063-9
Georgii Kostiuchik , Lalith Sharan , Benedikt Mayer , Ivo Wolf , Bernhard Preim , Sandy Engelhardt

Purpose

Machine learning approaches can only be reliably evaluated if training, validation, and test data splits are representative and not affected by the absence of classes. Surgical workflow and instrument recognition are two tasks that are complicated in this manner, because of heavy data imbalances resulting from different length of phases and their potential erratic occurrences. Furthermore, sub-properties like instrument (co-)occurrence are usually not particularly considered when defining the split.

Methods

We present a publicly available data visualization tool that enables interactive exploration of dataset partitions for surgical phase and instrument recognition. The application focuses on the visualization of the occurrence of phases, phase transitions, instruments, and instrument combinations across sets. Particularly, it facilitates assessment of dataset splits, especially regarding identification of sub-optimal dataset splits.

Results

We performed analysis of the datasets Cholec80, CATARACTS, CaDIS, M2CAI-workflow, and M2CAI-tool using the proposed application. We were able to uncover phase transitions, individual instruments, and combinations of surgical instruments that were not represented in one of the sets. Addressing these issues, we identify possible improvements in the splits using our tool. A user study with ten participants demonstrated that the participants were able to successfully solve a selection of data exploration tasks.

Conclusion

In highly unbalanced class distributions, special care should be taken with respect to the selection of an appropriate dataset split because it can greatly influence the assessments of machine learning approaches. Our interactive tool allows for determination of better splits to improve current practices in the field. The live application is available at https://cardio-ai.github.io/endovis-ml/.



中文翻译:

手术阶段和仪器识别:如何识别适当的数据集分割

目的

只有当训练、验证和测试数据分割具有代表性并且不受类别缺失的影响时,机器学习方法才能得到可靠的评估。手术工作流程和器械识别是两项以这种方式变得复杂的任务,因为阶段长度不同及其潜在的不稳定情况导致数据严重不平衡。此外,在定义分割时通常不会特别考虑工具(共)出现等子属性。

方法

我们提出了一种公开可用的数据可视化工具,可以对手术阶段和仪器识别的数据集分区进行交互式探索。该应用程序重点关注相发生、相变、仪器和跨组仪器组合的可视化。特别是,它有助于评估数据集分割,特别是在识别次优数据集分割方面。

结果

我们使用所提出的应用程序对数据集 Cholec80、CATARACTS、CaDIS、M2CAI-workflow 和 M2CAI-tool 进行了分析。我们能够发现其中一组中未出现的相变、单个器械以及手术器械的组合。为了解决这些问题,我们使用我们的工具确定了分割中可能的改进。一项有 10 名参与者参与的用户研究表明,参与者能够成功解决一系列数据探索任务。

结论

在高度不平衡的类别分布中,应特别注意选择适当的数据集分割,因为它会极大地影响机器学习方法的评估。我们的交互式工具可以确定更好的分割,以改进该领域的当前实践。实时应用程序可在 https://cardio-ai.github.io/endovis-ml/ 上获取。

更新日期:2024-01-29
down
wechat
bug