Abstract
Evaluations with accurate ground-truth labels (AGTLs) have been widely employed to assess predictive models for artificial intelligence applications. However, in some specific fields, such as medical histopathology whole slide image analysis, it is quite usual the situation that AGTLs are difficult to be precisely defined or even do not exist. To alleviate this situation, we propose logical assessment formula (LAF) and reveal its principles for evaluations with inaccurate ground-truth labels (IAGTLs) via logical reasoning under uncertainty. From the revealed principles of LAF, we summarize the practicability of LAF: (1) LAF can be applied for evaluations with IAGTLs on a more difficult task, able to act like usual strategies for evaluations with AGTLs reasonably; (2) LAF can be applied for evaluations with IAGTLs from the logical perspective on an easier task, unable to act like usual strategies for evaluations with AGTLs confidently.
Similar content being viewed by others
References
Chang HH, Zhuang AH, Valentino DJ, Chu WC (2009) Performance measure characterization for evaluating neuroimage segmentation algorithms. Neuroimage. https://doi.org/10.1016/j.neuroimage.2009.03.068
Taha AA, Hanbury A (2015) Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool. BMC Med Imaging 15:29. https://doi.org/10.1186/s12880-015-0068-x
Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5:01–11. https://doi.org/10.5121/ijdkp.2015.5201
Jung HJ, Lease M (2012) Evaluating classifiers without expert labels. https://doi.org/10.48550/arxiv.1212.0960
Deng W, Zheng L (2021) Are labels always necessary for classifier accuracy evaluation? In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 15069–15078
Warfield S, Dengler J, Zaers J et al (1995) Automatic identification of gray matter structures from MRI to improve the segmentation of white matter lesions. J Image Guid Surg. https://doi.org/10.1002/(SICI)1522-712X(1995)1:6%3c326::AID-IGS4%3e3.0.CO;2-C
Kikinis R, Shenton ME, Gerig G et al (1992) Routine quantitative analysis of brain and cerebrospinal fluid spaces with MR imaging. J Magn Reson Imaging. https://doi.org/10.1002/jmri.1880020603
Alonzo TA, Pepe MS (1999) Using a combination of reference tests to assess the accuracy of a new diagnostic test. Stat Med. https://doi.org/10.1002/(SICI)1097-0258(19991130)18:22%3c2987::AID-SIM205%3e3.0.CO;2-B
Beiden SV, Campbell G, Meier KL, Wagner RF (2000) The problem of ROC analysis without truth: the EM algorithm and the information matrix. In: Krupinski EA (ed) Medical Imaging 2000: Image Perception and Performance. pp 126–134
Korevaar DA, Toubiana J, Chalumeau M et al (2021) Evaluating tests for diagnosing COVID-19 in the absence of a reliable reference standard: pitfalls and potential solutions. J Clin Epidemiol. https://doi.org/10.1016/j.jclinepi.2021.07.021
Warfield SK, Zou KH, Wells WM (2004) Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging. https://doi.org/10.1109/TMI.2004.828354
Martin-Fernandez M, Bouix S, Ungar L, et al (2005) Two methods for validating brain tissue classifiers. In: Lecture notes in computer science (Including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). pp 515–522
Bouix S, Martin-Fernandez M, Ungar L et al (2007) On evaluating brain tissue classifiers without a ground truth. Neuroimage. https://doi.org/10.1016/j.neuroimage.2007.04.031
Joyce RJ, Raff E, Nicholas C (2021) A framework for cluster and classifier evaluation in the absence of reference labels. In: Proceedings of the 14th ACM workshop on artificial intelligence and security. ACM, New York, NY, USA, pp 73–84
Yang Y, Yang Y, Yuan Y et al (2020) Detecting helicobacter pylori in whole slide images via weakly supervised multi-task learning. Multimed Tools Appl 79:26787–26815. https://doi.org/10.1007/s11042-020-09185-x
Yang Y, Yang Y, Chen J, et al (2020) Handling noisy labels via one-step abductive multi-target learning and its application to helicobacter pylori segmentation
Zhou ZH (2019) Abductive learning: towards bridging machine learning and logical reasoning. Sci China Inf Sci. https://doi.org/10.1007/s11432-018-9801-4
Pearl J (1990) Reasoning under uncertainty. Annu Rev Comput Sci 4:37–72. https://doi.org/10.1146/annurev.cs.04.060190.000345
Krause P, Ambler S, Elvang-Goransson M, Fox J (1995) A logic of argumentation for reasoning under uncertainty. Comput Intell 11:113–131. https://doi.org/10.1111/j.1467-8640.1995.tb00025.x
Parsons S (2001) Qualitative methods for reasoning under uncertainty. The MIT Press
Dubois D, Prade H, Schockaert S (2017) Generalized possibilistic logic: Foundations and applications to qualitative reasoning about uncertainty. Artif Intell 252:139–174. https://doi.org/10.1016/j.artint.2017.08.001
Ristic B, Gilliam C, Byrne M (2021) Performance assessment of a system for reasoning under uncertainty. Inf Fusion 71:11–16. https://doi.org/10.1016/j.inffus.2021.01.006
Müller H, Holzinger A (2021) Kandinsky patterns. Artif Intell 300:103546. https://doi.org/10.1016/j.artint.2021.103546
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Warfield SK, Zou KH, Wells WM (2002) Validation of image segmentation and expert quality with an expectation-maximization algorithm. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics)
Beynon M, Curry B, Morgan P (2000) The dempster-shafer theory of evidence: an alternative approach to multicriteria decision modelling. Omega 28:37–50. https://doi.org/10.1016/S0305-0483(99)00033-X
Acknowledgement
The author, Yongquan Yang, thanks very much his Yang Family, Chengdu, China for providing him with the financial supports and mental encouragements during this research.
Author information
Authors and Affiliations
Contributions
Yongquan Yang is fully responsible for this paper.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, Y. Logical assessment formula and its principles for evaluations with inaccurate ground-truth labels. Knowl Inf Syst 66, 2561–2573 (2024). https://doi.org/10.1007/s10115-023-02047-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-02047-6