Artificial intelligence models in prediction of response to cardiac resynchronization therapy: a systematic review

Nazar, Wojciech; Szymanowicz, Stanisław; Nazar, Krzysztof; Kaufmann, Damian; Wabich, Elżbieta; Braun-Dullaeus, Rüdiger; Daniłowicz-Szymanowicz, Ludmiła

doi:10.1007/s10741-023-10357-8

Artificial intelligence models in prediction of response to cardiac resynchronization therapy: a systematic review

Open access
Published: 20 October 2023

Volume 29, pages 133–150, (2024)
Cite this article

Download PDF

You have full access to this open access article

Heart Failure Reviews Aims and scope Submit manuscript

Artificial intelligence models in prediction of response to cardiac resynchronization therapy: a systematic review

Download PDF

Wojciech Nazar¹,
Stanisław Szymanowicz²,
Krzysztof Nazar³,
Damian Kaufmann⁴,
Elżbieta Wabich⁴,
Rüdiger Braun-Dullaeus⁵ &
…
Ludmiła Daniłowicz-Szymanowicz ORCID: orcid.org/0000-0002-2269-1880⁴

1568 Accesses
1 Citation
Explore all metrics

Abstract

The aim of the presented review is to summarize the literature data on the accuracy and clinical applicability of artificial intelligence (AI) models as a valuable alternative to the current guidelines in predicting cardiac resynchronization therapy (CRT) response and phenotyping of patients eligible for CRT implantation. This systematic review was performed according to the PRISMA guidelines. After a search of Scopus, PubMed, Cochrane Library, and Embase databases, 675 records were identified. Twenty supervised (prediction of CRT response) and 9 unsupervised (clustering and phenotyping) AI models were analyzed qualitatively (22 studies, 14,258 patients). Fifty-five percent of AI models were based on retrospective studies. Unsupervised AI models were able to identify clusters of patients with significantly different rates of primary outcome events (death, heart failure event). In comparison to the guideline-based CRT response prediction accuracy of 70%, supervised AI models trained on cohorts with > 100 patients achieved up to 85% accuracy and an AUC of 0.86 in their prediction of response to CRT for echocardiographic and clinical outcomes, respectively. AI models seem to be an accurate and clinically applicable tool in phenotyping of patients eligible for CRT implantation and predicting potential responders. In the future, AI may help to increase CRT response rates to over 80% and improve clinical decision-making and prognosis of the patients, including reduction of mortality rates. However, these findings must be validated in randomized controlled trials.

Novel Artificial Intelligence Applications in Cardiology: Current Landscape, Limitations, and the Road to Real-World Applications

Article 22 April 2022

Artificial Intelligence Applications to Improve Risk Prediction Tools in Electrophysiology

Article 06 August 2020

Machine learning versus conventional clinical methods in guiding management of heart failure patients—a systematic review

Article 27 July 2020

Introduction

Cardiac resynchronization therapy (CRT) is one of the core treatment methods in chronic heart failure (HF) with reduced left ventricular ejection fraction (LVEF) and a wide QRS complex [1, 2]. According to the 2021 European Society of Cardiology (ESC) guidelines on HF, CRT-Pacemaker/CRT-Defibrillator should be considered for symptomatic patients with HF in sinus rhythm (SR) with a QRS duration (QRSd) ≥ 130 ms (ms) due to left bundle branch block (LBBB) or QRSd ≥ 150 ms if non-LBBB QRS morphology is present and with LVEF ≤ 35% despite optimal medical therapy (OMT) [2]. Clinical outcomes including death or HF hospitalization as well as improvements in the most important echocardiographic parameters, e.g., 6-month reduction in LVESV, have been proposed as outcomes relevant to CRT response [3, 4]. A reduction in morbidity and mortality as well as improvement in cardiac function with an enhancement of quality of life is observed after CRT implantation [1, 2].

Despite relatively clear guidelines, data from the literature report that about 30% of patients who meet eligibility criteria do not respond to CRT treatment [5,6,7,8,9,10,11] and identification of the phenotype of an “ideal” CRT responder remains a challenge [2, 12]. Thus, research on accurate prediction of CRT response continues. In addition to electrocardiographic and echocardiographic assessments, advanced computed tomography and magnetic resonance imaging studies are used to predict desired CRT outcomes [13]. Moreover, researchers try to combine several types of data to find new factors predicting positive CRT outcomes [14].

A novel advancement in clinical outcome prediction may result from the use of state-of-the-art statistical modeling, including artificial intelligence (AI) models [15, 16]. It was found that AI can be as good as, or sometimes even better than, health-care professionals in classifying diseases using medical imaging [17]. Moreover, AI can blend, analyze, and interpret a very sophisticated and broad range of different data types that are too complex for human-based analyses [18,19,20,21,22,23].

Therefore, training of complex AI algorithms seems to be a valuable solution to the task of accurate prediction of CRT response. In comparison with traditional statistics (descriptive statistics, statistical hypothesis testing with the use of p value), which describe trends and statistical significance based on the results for a group of patients, supervised AI models (S-AI model) can be used to predict CRT response for a single patient, providing a good basis for a personalized assessment of eligibility for CRT [24]. On the other hand, unsupervised AI models (U-AI model) can identify clusters of patients with homogenous clinical characteristics, including similar CRT outcome, which can reveal the phenotype of a CRT responder [25]. Recently, several studies that aim to utilize U-AI models in phenotyping of CRT respondents as well as S-AI models to predict the outcome of CRT have been published [4, 15, 16, 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48]. Since this research area develops rapidly and AI has proven to be effective in various clinical settings [17], it is of value to summarize the body of evidence on CRT response prediction with the use of AI.

Aim

The aim of the presented review is to summarize the literature data regarding the accuracy and clinical applicability of AI models as a valuable alternative to the current guidelines in predicting CRT response and phenotyping of patients eligible for CRT implantation.

For U-AI models, the ability of accurate clustering of CRT (non-)respondents is evaluated. For S-AI models, the accuracy of the patient selection that will benefit from CRT in the future is assessed. Prospective primary outcome events are used as reference standards.

Methods

Evidence acquisition

Literature search

This systematic review was performed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement [49]. Scopus, PubMed, Cochrane Library, and Embase databases were searched for relevant articles (Fig. 1).

Two separate database searches by WN and KN were conducted to identify studies eligible for the systematic review. The last search was performed on 31st May 2023. The basic search query was as follows: “(cardiac resynchronisation therapy OR dyssynchrony) AND (ecg OR electrocardiography OR electrocardiogram OR echocardiography OR cardiac ultrasound OR echocardiogram OR computed tomography OR ct OR magnetic resonanse imaging OR mri OR Single-photon emission computed tomography OR spect) AND (machine learning OR ai OR artificial intelligence OR machine intelligence OR k-means OR random forest OR gradient boost OR support vector machines OR decision tree OR lstm OR long short term network OR encoder OR decoder OR tensorflow OR pytorch OR keras OR classification algorithm OR supervised learning OR unsupervised learning OR clustering OR deep learning OR deep neural network OR cnn OR convolutional neural network OR computer vision OR rnn OR recurrent neural network)”.

The following keywords were chosen as they describe the studied intervention (CRT, group 1), the diagnostic approaches used to qualify the patients for the intervention (electrocardiography and echocardiography, group 2), and the use of AI in the study (group 3). The keywords in the groups were connected with the operator “OR” and the groups of keywords were separated with the operator “AND” to find articles eligible for the review.

Full search strategy for each database is available in Supplementary Material S1.

The preliminary search returned a total of 675 articles. Duplicates were removed. After title and abstract screening, 93 studies were selected for full-text analysis. Articles were excluded due to the following reasons: article did not meet the PICO criteria (46), study description was not a full-text original article (19), and cohort < 100 patients (6). A total of 22 studies were thus included in the review.

Selection of AI models

In each study, a maximum of one binary clinical outcome and one binary echocardiographic endpoint of CRT response were identified. Further on, for each binary outcome, a maximum of one supervised and one unsupervised algorithm were identified. Only the best “AI model-primary outcome” pairs were incorporated in the review.

A S-AI model was represented by any machine learning algorithm that was used for supervised classification of patients who will and will not benefit from CRT implantation. An U-AI model was defined as any clustering algorithm that was used for unsupervised phenotyping of CRT responders. Moreover, according to the definition of “artificial intelligence,” the algorithm must have been able to analyze and draw inferences from patterns in data without direct instructions on how to perform it [50]. If an AI model was not described in the article, the study was not included in this review.

For S-AI models, the best AI model for the prediction of a given primary outcome was the AI model with the best overall accuracy score or the AI model chosen and recommended by the authors of the study. For U-AI models, the best AI model was the AI model with the best discrimination power to distinguish between the identified clusters of patients or the AI model chosen and recommended by the authors of the study.

A deep learning algorithm was defined as a prediction model that consists of multiple stacked layers and uses data to capture hierarchical levels of abstraction [51].

Data extraction

Only the best AI model-primary outcome analyses were taken into consideration.

The following information was extracted for each analyzed AI model: 1st author, date of publication, objective, study design, number of patients, mean age of patients, percentage of males, percentage of patients with LBBB, percentage of patients with ischemic cardiomyopathy (ICM), inclusion and exclusion criteria, definition of the primary outcome, rate of primary outcome events, type of the best performing algorithm, pre- and post-implantation data used to train the AI model (demographic, clinical, laboratory, medications, echocardiography, electrocardiography, computed tomography, magnetic resonance imaging, ergospirometry, text, genetic, post-lead position, post-intracardiac electrocardiography, post-electrocardiography), number of input features, other algorithms tested, and the method used to handle missing values.

Additionally, for S-AI models, measures of the performance of the best algorithm (accuracy, specificity, sensitivity etc.) and the most predictive variables were extracted. Furthermore, availability of the AI model online was checked. For U-AI models, additional information included the number of clusters identified by the AI model, the results of phenotyping, and the factors associated with CRT response.

If specific information was not obtainable in the original manuscript, it was described as Not Reported (NR). In some cases, the overall efficacy of prediction was reported as the area under the receiver operating characteristic curve (AUC), sometimes as accuracy, sometimes both metrics were available. All of them were always extracted, and each of them was used in an independent statistical analysis.

PICO question

A PICO-styled research question was formulated to identify studies eligible for the analysis. Respectively, for studies describing AI models based on supervised algorithms:

1.
Patients: patients who received CRT-D or CRT-P
2.
Intervention (index test): prediction of response to CRT with the use of AI model
3.
Comparison: response to CRT defined by the primary outcome (e.g., death, HF hospitalization, LVEF improvement)
4.
Outcome: accuracy/AUC of the CRT response prediction capabilities of the AI model

For studies describing AI models based on unsupervised algorithms (clustering and phenotyping of CRT patients), the following PICO was formed:

1.
Patients: patients who received CRT-D or CRT-P
2.
Intervention: a group of patients with the highest/lowest proportion of CRT responders
3.
Comparison: proportion of CRT responders in other group(s) revealed by the phenotyping
4.
Outcome: comparison of the proportion of CRT responders between the identified clusters of patients

Evidence synthesis

All studies meeting the eligibility criteria are summarized in Tables 1 and 2. Additional data are available in the Supplementary Tables S2 and S3. To analyze and compare findings across the studies, counts, percentages, median values, and range were used. Due to large differences in patient inclusion/exclusion criteria as well as various primary outcome definitions across the reviewed studies, a quantitative synthesis of the results (meta-analysis) was not possible.

Table 1 Unsupervised artificial intelligence models

Full size table

Table 2 Supervised artificial intelligence models

Full size table

Quality assessment

The risk of bias (ROB) of included studies was performed by the first author of the study (WN). It was assessed according to the Prediction model Risk Of Bias ASsessment Tool (PROBAST) [52].

ROB was assessed in the following 4 domains: participants, predictors, outcome, and analysis. A total of 20 signaling questions were used to facilitate a structured assessment of the ROB, which was defined as occurring when shortcomings in study design, conduct, or analysis lead to systematically distorted estimates of a given model’s predictive performance. The first three domains were also used for applicability assessment. Last but not least, an overall assessment of the ROB and applicability of the prediction model was completed. A “ + ” indicates low ROB/low concern regarding applicability; “ − ” indicates high ROB/high concern regarding applicability; and “?” indicates unclear ROB/unclear concern regarding applicability [52]. For each eligible study, a total of 9 analyses (+ / − /? assessments) were conducted. In this review, if any domain of ROB or applicability assessment was rated as high/unclear ROB, the “overall ROB”/ “overall applicability” domains were also rated as high or unclear ROB. Since S-AI models aim at the prediction of response to CRT, only these AI models were analyzed with the use of the PROBAST. A detailed table is available in the Supplementary Material S1.

Results

General characteristics

This review included a total of 22 studies, which reported 29 separate AI model-primary outcome analyses with a total of 14,258 patients (Fig. 1). There were 20 S-AI models and 9 U-AI models (N = 11,743 and N = 2917 patients, respectively). These were further divided into echocardiography outcome-based and clinical outcome-based U-AI models as well as into echocardiography outcome-based and clinical outcome-based S-AI models (Tables 1, 2, and 3, Supplementary Tables S2, S3, and S4). The majority of studies included in our analysis have been published during the last three years.

Table 3 General characteristics of the analyzed artificial intelligence models

Full size table

Most of the AI models were based on data collected in retrospective cohort studies (N = 16, 55%), followed by prospective cohort studies (N = 8, 28%), randomized controlled trials (RCTs, N = 5, 17%), and case–control studies (N = 1, 3%). Each AI model was trained and validated on a median number of 328 patients (range 117–1668) with a median age of 67 years (range 60–72). The median rate of primary outcome events was 47% (range 15–78%). The median percentage of male patients across the analyzed AI models was 68% (range 50–87%).

The most frequent inclusion criteria included QRSd ≥ 120 ms, NYHA class, and reduced LVEF ≤ 35%. The most frequently reported primary endpoints included death, a decrease in LV end-systolic volume (LVESV), a LVEF improvement, and an occurrence of an HF event. Thirty-three percent of the AI models were based on a composite outcome.

The majority of AI models were trained with a non-deep learning algorithm. The median number of types of input data the AI models were trained on was 5 (range 1–7), and input features, 20 (range 2–487). The most often used types of input features were pre-implantation features such as echocardiographic parameters, followed by clinical data, demographic characteristics, and electrocardiographic data. Most of the AI models did not have any post-implantation input data.

Unsupervised AI models

A total of 3 echocardiographic outcome-based unsupervised AI models (EU-AI models) and 6 clinical outcome-based unsupervised AI models (CU-AI models) were included in the review (771/2917 patients, respectively; Tables 1, and 3, Supplementary Tables S2 and S4). The median number of patients for EU-AI models was 250 (range 193–328) and 289 (range 193–1106) for CU-AI models. The median rate of primary outcome events was 68% (range 65–74%) and 22% (range 15–78%), respectively. The median number of clusters identified by EU-AI models was equal to 4 (range 2–5) and 3 (range 2–5) for CU-AI models. The most used input data for U-AI models were echocardiographic, clinical, laboratory, and demographic characteristics. Almost all U-AI models came from retrospective and prospective cohort studies. The median number of input features used to train an EU-AI model was equal to 55 (range 28–70) and 39 (range 3–70) to train a CU-AI model. The median number of input data types, 5, was the same for EU- and CU-AI models.

Most commonly, the patients were qualified for CRT implantation based on wide QRSd of ≥ 120 ms, NYHA class, and reduced LVEF of ≤ 35%. The most frequently reported primary outcome measures were death, HF event, and a decrease in LVESV. Forty-four percent of the AI models had a composite primary endpoint. The clustering of CRT patients was mostly based on k-means clustering or k-medoid algorithms. Independent of the primary outcome as well as the number of clusters identified by the U-AI models, statistically significant differences in the rate of primary outcome events across the observed clusters were observed in each study.

Various clinical, echocardiographic, and electrocardiographic variables were associated with more favorable outcomes (Fig. 2). However, in one study, LVEF and QRSd were not related to more positive outcomes.

Supervised AI models

A total of 11 echocardiographic outcome-based supervised AI models (ES-AI models) and 9 clinical outcome-based supervised AI models (CS-AI models) were included in our review (6335/6413 patients, respectively; Tables 2, and 3, Supplementary Tables S3 and S4). The median number of patients for ES-AI models was 419 (range 130–1664) and 741 (range 117–1510) for CS-AI models. The median primary outcome reach was 51% (range 42–69%) and 32% (range 15–73%), respectively. In all described studies, always two groups of patients (responders/non-responders) were identified.

The median accuracy of ES-AI models and CS-AI models was equal to 78% (range 63–85%) and 75% (65–85%), respectively. The ES-AI models and CS-AI models achieved median AUCs of 0.76 (range 0.69–0.81) and 0.75 (range 0.69–0.86), respectively. Schmitz et al. created the best ES-AI model (85% accuracy, PART algorithm). In the group of CS-AI models, Bivona et al. proposed the best algorithm (AUC of 0.86, logistic regression algorithm). As many as 80% of all S-AI models achieved ≥ 70% accuracy/AUC ≥ 0.70 and 40% of S-AI models achieved ≥ 80% accuracy/AUC of ≥ 0.80.

Echocardiographic, clinical, demographic, and electrocardiographic characteristics made for the most commonly used input data for S-AI models. The majority of the S-AI models were based on data collected in retrospective cohort studies, followed by prospective cohort studies and RCTs. The median number of input features used to train an ES-AI model was equal to 16 (range 2–487) and 15 (range 3–45) to train a CS-AI model. The median number of input data types, 5, was the same for ES- and CS-AI models.

Inclusion criteria varied across the analyzed studies, but most of them qualified patients based on wide QRSd of ≥ 120 ms, NYHA class, and reduced LVEF of ≤ 35%. A decrease in LVESV and an absolute LVEF increase as well as death were the most frequently reported primary outcome measures, while 31% of the models had a composite outcome. Majority of algorithms used to create CRT prediction models were non-deep learning algorithms (85%), e.g., support vector machines or random forest algorithms.

Similar to the U-AI models, in S-AI models various clinical, echocardiographic, electrocardiographic, and laboratory characteristics were identified as the most predictive variables in the classification of CRT responders and non-responders (Fig. 2).

Almost all algorithms were validated internally, most commonly with the use of the tenfold or fivefold cross-validation protocol. Six studies (30%) were evaluated using an additional internal hold-out test dataset.

A total of 11 S-AI models are available online. The most accurate publicly available ES-AI model and CS-AI model were reported by Schmitz et al. (85% accuracy, > 100 patients) and Bivona et al. (AUC of 0.86, based on > 100 patients), respectively.

A total of 11 S-AI models (61%) had low overall ROB whereas 13 S-AI models (72%) had a low overall concern regarding applicability (Fig. 3). Unclear ROB in the domain “ROB participants” and “applicability participants” was allocated due to an unclear description of inclusion/exclusion criteria. Three S-AI models were rated as “unclear ROB” in the domain “ROB analysis” due to the absence of an explanation of how missing data were handled.