Logical assessment formula and its principles for evaluations with inaccurate ground-truth labels

Yang, Yongquan

doi:10.1007/s10115-023-02047-6

Logical assessment formula and its principles for evaluations with inaccurate ground-truth labels

Regular Paper
Published: 06 January 2024

Volume 66, pages 2561–2573, (2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Yongquan Yang¹

87 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Evaluations with accurate ground-truth labels (AGTLs) have been widely employed to assess predictive models for artificial intelligence applications. However, in some specific fields, such as medical histopathology whole slide image analysis, it is quite usual the situation that AGTLs are difficult to be precisely defined or even do not exist. To alleviate this situation, we propose logical assessment formula (LAF) and reveal its principles for evaluations with inaccurate ground-truth labels (IAGTLs) via logical reasoning under uncertainty. From the revealed principles of LAF, we summarize the practicability of LAF: (1) LAF can be applied for evaluations with IAGTLs on a more difficult task, able to act like usual strategies for evaluations with AGTLs reasonably; (2) LAF can be applied for evaluations with IAGTLs from the logical perspective on an easier task, unable to act like usual strategies for evaluations with AGTLs confidently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Handling noisy labels via one-step abductive multi-target learning and its application to helicobacter pylori segmentation

Article 12 January 2024

Domain Aware Medical Image Classifier Interpretation by Counterfactual Impact Analysis

One step further into the blackbox: a pilot study of how to build more confidence around an AI-based decision system of breast nodule assessment in 2D ultrasound

Article 06 January 2021

References

Chang HH, Zhuang AH, Valentino DJ, Chu WC (2009) Performance measure characterization for evaluating neuroimage segmentation algorithms. Neuroimage. https://doi.org/10.1016/j.neuroimage.2009.03.068
Article Google Scholar
Taha AA, Hanbury A (2015) Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool. BMC Med Imaging 15:29. https://doi.org/10.1186/s12880-015-0068-x
Article Google Scholar
Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5:01–11. https://doi.org/10.5121/ijdkp.2015.5201
Article Google Scholar
Jung HJ, Lease M (2012) Evaluating classifiers without expert labels. https://doi.org/10.48550/arxiv.1212.0960
Deng W, Zheng L (2021) Are labels always necessary for classifier accuracy evaluation? In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 15069–15078
Warfield S, Dengler J, Zaers J et al (1995) Automatic identification of gray matter structures from MRI to improve the segmentation of white matter lesions. J Image Guid Surg. https://doi.org/10.1002/(SICI)1522-712X(1995)1:6%3c326::AID-IGS4%3e3.0.CO;2-C
Article Google Scholar
Kikinis R, Shenton ME, Gerig G et al (1992) Routine quantitative analysis of brain and cerebrospinal fluid spaces with MR imaging. J Magn Reson Imaging. https://doi.org/10.1002/jmri.1880020603
Article Google Scholar
Alonzo TA, Pepe MS (1999) Using a combination of reference tests to assess the accuracy of a new diagnostic test. Stat Med. https://doi.org/10.1002/(SICI)1097-0258(19991130)18:22%3c2987::AID-SIM205%3e3.0.CO;2-B
Article Google Scholar
Beiden SV, Campbell G, Meier KL, Wagner RF (2000) The problem of ROC analysis without truth: the EM algorithm and the information matrix. In: Krupinski EA (ed) Medical Imaging 2000: Image Perception and Performance. pp 126–134
Korevaar DA, Toubiana J, Chalumeau M et al (2021) Evaluating tests for diagnosing COVID-19 in the absence of a reliable reference standard: pitfalls and potential solutions. J Clin Epidemiol. https://doi.org/10.1016/j.jclinepi.2021.07.021
Article Google Scholar
Warfield SK, Zou KH, Wells WM (2004) Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging. https://doi.org/10.1109/TMI.2004.828354
Article Google Scholar
Martin-Fernandez M, Bouix S, Ungar L, et al (2005) Two methods for validating brain tissue classifiers. In: Lecture notes in computer science (Including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). pp 515–522
Bouix S, Martin-Fernandez M, Ungar L et al (2007) On evaluating brain tissue classifiers without a ground truth. Neuroimage. https://doi.org/10.1016/j.neuroimage.2007.04.031
Article Google Scholar
Joyce RJ, Raff E, Nicholas C (2021) A framework for cluster and classifier evaluation in the absence of reference labels. In: Proceedings of the 14th ACM workshop on artificial intelligence and security. ACM, New York, NY, USA, pp 73–84
Yang Y, Yang Y, Yuan Y et al (2020) Detecting helicobacter pylori in whole slide images via weakly supervised multi-task learning. Multimed Tools Appl 79:26787–26815. https://doi.org/10.1007/s11042-020-09185-x
Article Google Scholar
Yang Y, Yang Y, Chen J, et al (2020) Handling noisy labels via one-step abductive multi-target learning and its application to helicobacter pylori segmentation
Zhou ZH (2019) Abductive learning: towards bridging machine learning and logical reasoning. Sci China Inf Sci. https://doi.org/10.1007/s11432-018-9801-4
Article MathSciNet Google Scholar
Pearl J (1990) Reasoning under uncertainty. Annu Rev Comput Sci 4:37–72. https://doi.org/10.1146/annurev.cs.04.060190.000345
Article MathSciNet Google Scholar
Krause P, Ambler S, Elvang-Goransson M, Fox J (1995) A logic of argumentation for reasoning under uncertainty. Comput Intell 11:113–131. https://doi.org/10.1111/j.1467-8640.1995.tb00025.x
Article MathSciNet Google Scholar
Parsons S (2001) Qualitative methods for reasoning under uncertainty. The MIT Press
Book Google Scholar
Dubois D, Prade H, Schockaert S (2017) Generalized possibilistic logic: Foundations and applications to qualitative reasoning about uncertainty. Artif Intell 252:139–174. https://doi.org/10.1016/j.artint.2017.08.001
Article MathSciNet Google Scholar
Ristic B, Gilliam C, Byrne M (2021) Performance assessment of a system for reasoning under uncertainty. Inf Fusion 71:11–16. https://doi.org/10.1016/j.inffus.2021.01.006
Article Google Scholar
Müller H, Holzinger A (2021) Kandinsky patterns. Artif Intell 300:103546. https://doi.org/10.1016/j.artint.2021.103546
Article MathSciNet Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Article Google Scholar
Warfield SK, Zou KH, Wells WM (2002) Validation of image segmentation and expert quality with an expectation-maximization algorithm. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics)
Beynon M, Curry B, Morgan P (2000) The dempster-shafer theory of evidence: an alternative approach to multicriteria decision modelling. Omega 28:37–50. https://doi.org/10.1016/S0305-0483(99)00033-X
Article Google Scholar

Download references

Acknowledgement

The author, Yongquan Yang, thanks very much his Yang Family, Chengdu, China for providing him with the financial supports and mental encouragements during this research.

Author information

Authors and Affiliations

Institution of Clinical Pathology, West China Hospital, Sichuan University, 2 Huafu Road, Chengdu, 610213, China
Yongquan Yang

Authors

Yongquan Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yongquan Yang is fully responsible for this paper.

Corresponding author

Correspondence to Yongquan Yang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 61 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, Y. Logical assessment formula and its principles for evaluations with inaccurate ground-truth labels. Knowl Inf Syst 66, 2561–2573 (2024). https://doi.org/10.1007/s10115-023-02047-6

Download citation

Received: 28 April 2023
Revised: 23 August 2023
Accepted: 07 December 2023
Published: 06 January 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s10115-023-02047-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Logical assessment formula and its principles for evaluations with inaccurate ground-truth labels

Abstract

Access this article

Similar content being viewed by others

Handling noisy labels via one-step abductive multi-target learning and its application to helicobacter pylori segmentation

Domain Aware Medical Image Classifier Interpretation by Counterfactual Impact Analysis

One step further into the blackbox: a pilot study of how to build more confidence around an AI-based decision system of breast nodule assessment in 2D ultrasound

References

Acknowledgement

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 61 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Logical assessment formula and its principles for evaluations with inaccurate ground-truth labels

Abstract

Access this article

Similar content being viewed by others

Handling noisy labels via one-step abductive multi-target learning and its application to helicobacter pylori segmentation

Domain Aware Medical Image Classifier Interpretation by Counterfactual Impact Analysis

One step further into the blackbox: a pilot study of how to build more confidence around an AI-based decision system of breast nodule assessment in 2D ultrasound

References

Acknowledgement

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 61 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation