1 Introduction

1.1 Background

Since 1980, the number of diabetes mellitus cases—more popularly known as diabetes—has nearly doubled, making it the most common noncommunicable disease (NCD) in the world Diabetes, a chronic condition, occurs when the pancreas cannot produce sufficient insulin or when the body cannot sufficiently use the produced insulin [1]. The World Health Organisation (WHO) found that both the prevalence of diabetes and the number of fatalities are rising globally [2]. The WHO also predicts that by 2030, diabetes will be the seventh leading cause of death if the number of diabetic individuals continues to rise [3]. Diabetes does not have any known treatments. Presumably, environmental and genetic factors play important roles in diabetes development. Family history of diabetes, ethnicity, age, inactivity, poor diet, excessive weight and smoking are examples of the risk factors associated with diabetes. Type 1 diabetes and type 2 diabetes are the two subtypes of this disease [4]. Diabetes can affect both old and young people [5,6,7], Fortunately, for the majority of diabetic patients, problems can be readily managed or prevented, but only if the illness is identified and treated early. Early prediction of the risk of diabetes can decrease the prevalence of diabetes and its complications, thus helping to conserve national medical resources. The tendency of an individual to be susceptible to or at risk of diabetes must be estimated. Early detection of diabetes and other chronic conditions contributes to reduced medical expenses and a reduced likelihood of developing more severe health problems. This situation implies that conclusions must be drawn immediately and accurately from measurable medical indicators, especially in emergency cases where patients cannot communicate or are in an unconscious state. This research was conducted to help clinicians improve their decision making regarding medical treatments for patients with high-risk cases. Incidentally, almost all NCD cases are misdiagnosed, especially since patients manifest few symptoms in the early stages of the disease. This situation demonstrates the complexity of ensuring early identification and diagnosis. More alarming, the failure to diagnose diabetes early is associated with the occurrence of kidney disease [2]. The ability to avoid costly treatments as the disease progresses later in life is one of the advantages of treating patients when they first begin to experience diabetes. Meanwhile, the lack of medical professionals in underserved areas, such as remote and rural villages, worsens the problem. In such circumstances, IoMT and ML models can be combined to build prediction tools that can more timely and effectively guide decision-making, thereby aiding healthcare workers in the early diagnosis and detection of diabetes [8, 9].

1.2 Research problem and objective

Predicting a person’s likelihood of risk and sensitivity to a chronic condition, such as diabetes, is a crucial undertaking. Early detection of chronic conditions reduces the need for expensive medical care and lowers the likelihood of developing more severe health issues. As patients have minimal symptoms in the early stages of the disease and the majority of NCD cases remain undetectable, guaranteeing early identification and diagnosis is extremely difficult. One of the benefits of treating patients early in the course of their NCDs is that they can avoid costly therapies when the illness worsens in later life. However, chronic illness is complicated by the dearth of medical professionals in underserved areas, such as distant and rural communities. In these situations, Internet of Medical Things (IoMT) and machine learning (ML) models can be used to offer healthcare practitioners the necessary prediction tools to more effectively and timely guide their decision making, eventually assisting the early identification and diagnosis of NCDs[10, 11]. The aim of this study is to develop and present an e-diagnosis IoMT system for predicting diabetes. The primary contribution of the proposed design is to afford early identification and diagnosis in the early stages of disease amongst patients living in underserved areas, such as distant rural communities, by using IoMT and ML models. In this manner, healthcare practitioners can utilise prediction tools with the ability to perform more effective and timely decision making, thereby assisting in the early identification and diagnosis of NCDs. The proposed design offers the following advantages: (1) prediction of an individual’s risk for diabetes depending on several risk factors, (2) clinicians afforded an initial diagnosis and (3) patients presented with feedback on their doctor’s recommendations for exercise, blood glucose testing and diet. The general layout of the suggested e-diagnostic system for predicting diabetic illness in an IoMT setting is shown in Fig. 1. The system is made up of a number of essential parts that work together to enable real-time diabetes monitoring and prediction. These parts include the IoMT devices, data pre-processing, feature extraction to extract pertinent features from the pre-processed data, and decision support for the prediction results from the ML models, which are used to support clinical decision-making and patient feedback. Based on the prediction results, the system also gives patients feedback, advising them on preventive measures, lifestyle modifications, or the need for additional medical consultation. Aimed at determining which ML classifier could attain superior results, this study evaluated common ML algorithms by using various metrics, such as precision, accuracy, F1, recall and area under the curve (AUC). A popular benchmark dataset is the PIMA Indian Diabetes Dataset (PIDD), which includes PIMA data on Indian females aged 21 and older [12]. The PIDD was utilised as the dataset in this research.

Fig. 1
figure 1

e-Diagnostic system in IoMT

2 Related work

IoMT is the application of the Internet of Things (IoT) in the medical field. By using networking technologies, IoMT can connect medical equipment and its applications with healthcare IT systems [13]. This innovative development has altered the medical field with its novel-designed remote healthcare systems from the aspects of perception, social benefits and reliable detection of illnesses. Recent advancements in IoT computing have also eased the burden of accomplishing clinical goals, such as updating patient data, identifying appropriate medical instruments, performing remedies and implementing medical orders [14]. The development of IoMT has led to massive changes in the promotion of disease management, thereafter, improving disease treatment and diagnostic approaches; it has also helped reduce healthcare costs and minimise mistakes. Such transformation has dramatically affected healthcare quality for frontline healthcare professionals and patients. Furthermore, IoMT can be considered a thriving force for medical professionals, researchers, patients and insurers, enabling many cases, such as telemedical support, drug management, data insight generation, patient tracking and operation enhancement, to be utilised [15]. Emerging approaches in data science, which can clarify common problems, also serve other scientific areas, including medicine. Medical professionals and researchers have utilised various artificial intelligence (AI) or ML approaches, such as support vector machines (SVMs), artificial neural networks (ANNs), gradient boosting decision trees (DTs) and naïve Bayes (NB) classifiers, to predict diabetes. A. Rajagopal et al. [16] used a customised hybrid model composed of ANN and genetic algorithms to develop a framework for diabetes prediction. They improvised an approach to more effectively detect visible relational patterns amongst variables and attained a prediction accuracy of 80%.

Hanaa Salem et al. [17] developed a classifier by using the fuzzy K-nearest neighbour (KNN) and modified the function of membership based on uncertainty theory. Grid search was applied to attain optimal values in tuning the fuzzy KNN (TFKNN) method based on uncertainty membership. Their algorithm was superior to other trained and assessed classifiers, including KNN, NB, fuzzy KNN and DT; the accuracy level of TFKNN reached 90.63%. Marmik Shrestha et al. [18] developed a scheme to improve the processing time of diabetes prediction. Their proposed system utilised DL with SVMs and then combined it with a radial basis function and a long short-term memory layer. Their proposed solution attained a mean accuracy of 86.31%. Hafsa Binte Kibria et al. [19] proposed ‘explainable AI’, an interpretable diabetes detection method composed of six ML algorithms: ANN, random forest (RF), SVM, AdaBoost, XGBoost and LR. The algorithms were then combined with an ensemble classifier for diabetes diagnosis. Each ML algorithm was developed using Shapley additive explanations (SHAPs), which were then represented in various graph types to assist physicians in understanding modelling predictions. The balanced accuracy of the proposed weighted ensemble model reached 90%. Amidst the recent technological development of IoMT, research on traditional methods remained at the forefront. Yifei Su et al. [20] used ML to investigate age adaptation for diabetes risk prediction by focussing on linear regression, logistic regression, polynomial regression, neural network (NN), SVM, RF and XGBoost. By applying feature compensation and soft-decision threshold adjustment, they found that performance could be improved by using the compensated features. The best accuracy of 78.8% was attained by logistic regression.

Muhammad Exell Febrian et al. [21] used ML to develop an AI model for diabetes prediction and death prevention. The supervised ML classifiers were compared (i.e. NB versus KNN) by primarily focussing on different health features on a dataset. On the basis of their experimental results and algorithm evaluation, NB outperformed KNN by 76.07%. Victor Chang et al. [10], achieved better accuracy in using NB in contrast to other algorithms mentioned in other studies. A 70:30 split ratio was utilised, and features were selected via PCA and K-means clustering. The performances of the algorithms were measured on the basis of all dataset features. Then, the datasets with three and five features were compared. The DT, RF and NB algorithms were used for classification. The NB model worked efficiently with more fine-tuned selected features for binary classification, attaining a performance score of 86.15%. Subhash Chandra Gupta et al. [22], whose research aim was to comparatively analyse the performances of different ML classification algorithms, utilised four models (KNN, DT, RF and SVM) in their experiments for hyperparameter classification. Then, the hyperparameters were tuned to improve the performances of the aforementioned models. The highest accuracy was 88.61% for the RF classifier. Khoula Al Said et al. [23] applied six ML algorithms (K-NN, SVM, NB, DT, RF, ANN and linear discriminant analysis), and the most commonly utilised PIMA datasets were compared with their own dataset. The RF and DT models performed better in their dataset compared with the other algorithms, providing 98.38% accuracy for the Oman data. Meanwhile, the SVM model performed better in the PIMA dataset compared with the other algorithms, achieving a performance score of 78.4%.

Several machine learning (ML) models have been used previously to predict diabetes, but more work has to be done to improve accuracy and generalise the results to different populations. In contrast to models in other studies, the Hyper AdaBoost model used in this work gives exceptionally high accuracy. Dealing with imbalanced datasets is a challenge in medical data analysis, especially diabetes prediction, as it might result in predictions that are skewed towards the majority class. Whilst some current approaches try to address this problem, the Synthetic Minority Over-sampling Technique (SMOTE) is used in this study to successfully balance the dataset. By demonstrating how machine learning models can be successfully implemented in IoMT platforms for improved healthcare delivery, this study fills a gap in the seamless integration of advanced ML models with IoMT platforms for real-time diabetes prediction. This is done through the development of an e-diagnostic system designed for IoMT environments. However, there is relevant research to evaluate the strengths and weaknesses of each previous work in relation to our study which will be thoroughly covered in Table 1 below.

Table 1 Comparative analysis of previous works highlights the technical depth

3 Methodology

This research aimed to examine the PIMA dataset by using a robust ML model combined with IoMT. The PIMA dataset contained 768 rows (i.e. 268 diabetic cases and 500 non-diabetic cases) and 9 columns with a binary classification problem. Figure 2 shows a diagram for the sequential pre-processing of the training data. Outliers were initially searched in the original training data Fig. 3. Presumably, outliers in the analysed features of blood pressure, BMI, glucose, DPF, insulin, and age could be explained by other underlying factors. Given that the aim was to minimise the negative effect caused by outliers, the data needed to be normalised. Furthermore, as the dataset was not particularly large, the unnecessary removal of rows could be avoided. The PIMA dataset contained an imbalanced distribution of classes Fig. 4, with significantly more data points for nondiabetic patients (majority class) than for diabetic patients (minority class). If skewed data are not corrected beforehand, then the performance of the classifier model would be affected. The majority of predictions would be correct for the majority class, but minority class features would be discarded as data noise. Thus, synthetic-minority sampling (SMOTE) was used to overcome the imbalanced problem involving the negative and positive classes. Then, hyperfeature selection was conducted by combining recursive feature elimination (RFE) and random forest (RF). Feature importance was performed to identify the most relevant features required to solve the PIMA classification problem. Different metrics (accuracy, F1, precision, recall, and AUC) were used to evaluate the performance of each classification model and further validate the classification findings. Finally, the data were divided into a training set and a test set. The training data represented 70% of the overall dataset, whereas the test data represented the remaining 30%. The ML models and the pre-processing of the data are implemented using Python. Jupyter Notebooks, which offer an interactive interface for code execution, visualisation, and documentation, were used for the simulation study.

Fig. 2
figure 2

Flowchart of the proposed method

Fig. 3
figure 3

Visualisation of outliers for all features in the PIMA dataset

Fig. 4
figure 4

Class distribution of the PIMA dataset

3.1 Dataset description

The PIMA diabetes dataset is a compilation of information on women from the PIMA group in Phoenix, USA who have undergone medical testing. These responses have been the focus of the study because type 2 diabetes affects women patients more than any other population. The PIMA dataset is open source and can be downloaded for free from Kaggle [24]. As shown in Fig. 4, the dataset is imbalanced and contains medical test results from 768 patients, 500 of whom are nondiabetic and 268 of whom are diabetic. In other words, the ‘majority class’s is nondiabetic (i.e. negative), whereas the ‘minority class’s is diabetic (i.e. positive). PIMA is a binary class classification dataset. ‘1’ denotes people with diabetes, whereas ‘0’ denotes patients without diabetes.

The columns in the PIMA dataset are as follows:

  1. 1.

    Pregnant: Number of pregnancy times

  2. 2.

    BP: Blood pressure

  3. 3.

    Skin: Skin fold thickness (mm)

  4. 4.

    Glucose: Plasma glucose concentration

  5. 5.

    Insulin: Serum insulin (Xmu U/mL)

  6. 6.

    DPF: Diabetes pedigree function

  7. 7.

    BMI: Body mass index

  8. 8.

    Age (by years)

  9. 9.

    Labels (0,1): 0 is nondiabetic; 1 is diabetic

3.2 Oversampling and normalisation

Imbalanced data collection frequently constrains learning models [25]. The most typical problem with medical records is imbalanced data, which occurs when the number of patients is smaller than the number of nonpatients, thus posing a substantial barrier in ML calculations. As depicted by the PIMA diabetes data analysed in this study, the nonpatient data comprise a significant majority of classes, indicating bias in classification accuracy that will likely influence the nonpatient data. This problem was solved by oversampling in SOMTE (Fig. 5). The SMOTE method [26] generates a new minority sample by interpolating a homogenous and randomly selected surrounding sample, thus increasing the discovery rate of the minority sample. The different distributions of the feature values in the PIMA dataset cause noise in the classification performance [27]. The dataset should be normalised to a homogeneous range in a range between [0,1]. The following formula is used to compute normalisation:

$$f_{i}^{{{\text{new}}}} = \frac{{f_{i}^{{{\text{old}}}} - \left( F \right) }}{\left( F \right) - \left( F \right) }.$$
(1)
Fig. 5
figure 5

PIMA class distribution with SMOTE

The function’s current value is fi, and the characteristic curve ‘minimum and maximum’ values are given by min (F) and max (F).

3.3 Feature selection via wrapper methods

In this study, samples were created with and without characteristic data to determine the influence of feature selection. The purpose of feature selection is to identify the most significant characteristics in the PIMA dataset. Furthermore, feature selection contributes to more accurate prediction by removing or underrepresenting less significant information, thus saving training time and improving learning performance [27]. The present study used RFE to iteratively train a model, and weighting was used as a criterion. The least important feature was also removed as a criterion in each iteration. The objective of RFE is to select features by iteratively examining feature groupings as they decrease in size.

The RFE evaluation function was used to compare the expected value and the best previously stored optimum value, allowing the utility of each subset derived from the search to be determined (Fig. 6). The candidate key component was replaced by a higher observed value [28]. Suitable termination conditions were considered during the function quest to avoid the infinite cycle state. In general, the search and evaluation function used for the subset is one of the variables that influences the choice of termination conditions. A quest technique for the appropriate subset of features can hasten the identification of the best features during product selection. An improved classification performance of the selected feature subset implies an enhanced scoring function, which may increase the algorithm’s performance. A differentiating number exceeding a specific threshold or the number of search iterations to reach a specified threshold may also be used as the termination condition strategy. By increasing or decreasing the number of characteristic subsets, the stopping criterion based on the evaluation function may include either finding the best solution or failing to achieve a higher evaluation value [29].

Fig. 6
figure 6

Feature selection process

3.4 Feature importance analysis

Feature significance shows how input attributes can be used to predict a target variable. This goal is achieved when the feature significance operation assigns a rating to the input features and provides important insights into the prediction model (classification model). The listing of feature significance eventually improves the efficacy and efficiency of the predictive modelling project [30]. In this study, the SHAP summary graphic was utilised. Figure 7 shows the SHAP summary graphic and its two primary benefits: feature ranking and influence of each feature. The influences of the feature rating are illustrated in descending order (i.e. higher to lower importance) with respect to the y- and x-axes. Positive SAHP values imply a positive correlation. The importance of the features for any prediction or classification model may easily be determined by sorting the features in ascending order, with the most significant feature occupying the peak point. Figure 7 shows a bar plot of the top eight characteristics, with ‘glucose’ at the top rank. The other prominent characteristics are ‘age’, ‘BMI’ and ‘DPF’.

Fig. 7
figure 7

SHAP analysis of PIMA features

3.5 Machine learning

In IoMT, ML algorithms are used to create complicated models and extract medical information, consequently providing new insights to clinicians and specialists [30, 31]. In clinical practise, predictive ML models can highlight improved rules in inpatient care decision-making. In this study, the following ML models were utilised to classify the PIMA dataset.

3.5.1 Random forest

RF, an extension of DT, is composed of several single DTs, each one providing a specific class of prediction outcomes. The category with the highest votes in the forest affects the final prediction result of the RF classifier. Figure 8 shows four of six single-DTs in the forest with prediction scores of 1, whereas the remaining two single-DTs have predictive values of 0. As a result, the forecast result of the RF is 1. The classifier’s strong performance can be attributed to the trees in the forest being generally unconnected to one another; in other words, the judgments provided collectively are superior to the decisions made individually [32].

Fig. 8
figure 8

Example of how an RF model makes predictions

3.5.2 Decision tree

DTs [33] are an essential and commonly used learning method for solving regression and classification issues. A DT is composed of decision nodes (i.e. to test the value of an attribute), edges (which link with the next node) and leaf nodes (i.e. to predict the conclusion), as shown in Fig. 9. Each feature of a dataset is considered a node in the DT, with a particular and unique node serving as the root node. Here, a single node was used to explain the DT process. Then, a tree was constructed to fulfil the research requirements and decision-making process.

Fig. 9
figure 9

DT structure [33]

3.5.3 Extra trees

Extra trees [34] create several DTs that are distinct from the entire training dataset. Then, an algorithm selects a split rule for the root node based on a partially random cut point and a randomly selected subset of characteristics (K). For the parent node to be split into two equally child nodes, a random splitting operation is performed; unless a leaf node is reached, this procedure is repeated for each child node. Nodes lacking child nodes are referred to as leaf nodes. With the use of a majority vote, the forecasts from each tree are pooled to determine the outcome.

3.5.4 Light gradient boosting machine (LGBM)

LGBM is a technique of learning from multiple classifiers during the training process to alter the weight of samples; it linearly collects classified results to improve the classification performance [35]. In 2011, Friedman, a scientist, expanded the functions of the gradient by introducing the gradient reinforcement machine, with the aim of reducing the loss function in the model. In each iterative training process, the loss value of the model could be estimated for negative training. The tree was successively trained to improve the regression of the new tree. Finally, a current regression tree was added, and the remaining value was updated. The algorithm repeats the training process until the limit set by the user is reached.

3.5.5 AdaBoost

AdaBoost [34] allows multiple ‘weak classifiers’ to be combined into a single classifier called a ‘strong classifier’. These trees are commonly known as ‘decision stumps’. This approach creates a model by assigning the same weight to each data item. Thus, incorrectly classified points are given additional weight. In subsequent models, all points with greater weights are given more importance. The models are trained until the error is minimised.

3.5.6 Hyper AdaBoost

The performance of the aforementioned algorithms (RF, DT, ET and LGBM classifiers) can be evaluated in terms of recall, precision, accuracy and F1 scores on the PIMA dataset. Here, the performance of the homogeneous AdaBoost, created as the next-phase AdaBoost classifier, was paired separately with the other classifiers and then measured against the metrics mentioned above. The model that could offer the best performance would be suggested for use in the early identification of diabetes.

4 Results

Despite the larger and more sophisticated diabetes datasets that researchers currently utilise, the PIDD continues to be a standard for diabetes classification studies. This section shows the results achieved on the PIMA diabetes datasets as reported in Sect. 3.1. In this study, a novel ML model was utilised to overcome the problems of the PIMA diabetes dataset. Firstly, during pre-processing, the data should not contain any missing or duplicate values. However, the dataset presented an imbalanced issue, which could lead to improper classification, consequently influencing the outcomes. Thus, the SMOTE oversampling approach was adopted to resolve the imbalanced dataset issue. RFE and RF were also used to minimise the number of features in the PIMA dataset. In this study, four single classifiers (RF, DT, ET and LGBM) and four hyper classifiers (i.e. single classifiers were fused with the ensemble AdaBoost model) were used to diagnose diabetes. The effectiveness of the proposed method was examined by splitting the data into 70:30 for the test–train sets. All features were normalised before they were applied to the classifiers.

4.1 Hyperparameter tuning

The classifier structure of the model can be determined on the basis of the notion of hyperparameters [28]; that is, the best results of a model are obtained using different values of hyperparameters. The process of allocating values to classifiers for determining the best result is called hyperparameter tuning. In this study, grid search was used to adjust the hyperparameters, and then values were applied to all classifiers. The optimal parameters are shown in Table 2.

Table 2 Optimal hyperparameters of each ML model

4.2 Evaluation metrics

This work adopted different measures to assess the efficacy of each built predictive model based on several supervised ML algorithms. For the evaluation metrics, a confusion matrix was utilised to accurately predict the evaluated outcomes. The classification metrics are shown in Table 3.

Table 3 Confusion matrix values

Recall, accuracy, F1 score, and precision were calculated according to Eqs. (2)–(5), respectively [38], allowing the performance of the models to be assessed. Additionally, as discussed in Sect. 4.1, six confusion matrices for three classifiers and the ‘Pearson’ correlation coefficient were utilised.

$${\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{FP}} + {\text{TN}} + {\text{FN}}}}$$
(2)
$${\text{Recall}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$
(3)
$${\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}}$$
(4)
$$F1{\text{ - Score}} = 2* \frac{{{\text{Recall}}*{\text{Precision}}}}{{{\text{Recall}} + {\text{Precision}}}}$$
(5)

4.3 Models evaluation

Figure 10 shows the overall results of the dataset before pre-processing. Four evaluation metrics were used: accuracy, precision, recall and F1 score. Each classifier was trained on 70% of the data and tested on the remaining 30%. The ET model provided the best evaluation result for the original data, with an accuracy of 77%, utilising 8 features and 768 total health records (500 nondiabetic and 268 diabetic patient records for the training and test sets, respectively).

Fig. 10
figure 10

Original results of the PIMA dataset

After applying SMOTE to balance the dataset, the total patient records reached 1000 records. Moreover, hyper feature selection via RFE and RF was applied to determine the optimal number of features. After applying the RFE and RF, two groups of features presented the highest influence on classification performance. The first group of features contained glucose, BP, BMI, DPI and age (Fig. 11). All models were trained and tested with these features. AdaBoost-ET was selected as the best model, reaching an accuracy of 85%.

Fig. 11
figure 11

PIMA results based on five features

The second group of features included glucose, BMI and age (Fig. 12). All models were trained and tested with these features. The optimal evaluation performance was achieved by AdaBoost-RF, reaching an accuracy of 78%.

Fig. 12
figure 12

PIMA results based on three features

Figures 9, 10, 11 show all of the ML models trained and tested on the basis of the default parameters. The ML algorithms are dependent upon hyperparameter tuning of the classifier to achieve optimal performance for classification problems. Therefore, a grid search was performed to tune the parameters to optimally execute each of the classification algorithms (Table 2). The optimal performance was obtained on the basis of tuning the hyperparameters with all features by using the AdaBoost-ET model with an accuracy of 92%, as shown (Fig. 13). Whilst partitioning the data to 70:30 on the other hand performance was obtained on the basis of tuning the hyperparameters with all features by using the AdaBoost-ET model with an accuracy of 91%, as shown (Fig. 14). Whilst partitioning the data to 80:20.

Fig. 13
figure 13

Evaluation results based on the hyperparameters with 70:30 partition

Fig. 14
figure 14

Evaluation results based on the hyperparameters with 80:20 partition

ROC was used to evaluate and analyse the performance of the classifier. On this basis, the model that could improve class prediction was identified. The ROC values were then compared with the FP and TP rates (Fig. 15). A score of 0.8 to 0.9 was considered good, and a score higher than 0.9 was considered excellent. The AUC of the AdaBoost-ET model reached 0.94, outperforming the scores of other models. Nonetheless, the AUC results of the other models were good, all of them exceeding 0.8.

Fig. 15
figure 15

ROC analysis

5 Discussion and comparison

Critical issues in diabetes prediction in an Internet of Medical Things (IoMT) context, such as imbalanced datasets and constrained feature selection techniques, are addressed by the Hyper AdaBoost model that has been suggested. The model balanced the dataset and made sure that only the most relevant features were used for prediction by using the Synthetic Minority Over-sampling Technique (SMOTE) in conjunction with a combination of recursive feature elimination (RFE) and random forest (RF) for hyperfeature selection. This method lowers computing costs and improves the accuracy of the model, as shown in Figs. 13 and 14, which makes it appropriate for integration with IoMT platforms for diabetes prediction and real-time monitoring. An accuracy of 92% on the PIMA Indian Diabetes Dataset shows how well the model performs, indicating its resilience and generalizability to other populations. To maximise the model's performance, the simulation parameters which included a 70:30 split between the test and the training phase, as shown in Table 2, ideal hyperparameters found by grid search, and assessment metrics like accuracy, precision, recall, F1 score, and area under the curve (AUC) were carefully selected. As a result, the Hyper AdaBoost Model addresses the demand for sophisticated ML techniques in healthcare by offering a viable tool for the early diagnosis and management of diabetes in IoMT situations.

The finest methods were selected for the binary classification of the PIMA dataset. In nine other studies, identical PIDDs were utilised. On this basis, eight models, four traditional techniques (RF, DT, ET and LGBM) and four homogeneous techniques (AdaBoost-RF, AdaBoost-DT, AdaBoost-ET and AdaBoost-LGBM), were identified. The performances of the models were compared in terms of their ability to conduct feature selection. Then, the results obtained by the present work were compared with those of earlier studies (Table 4).

Table 4 Performance comparison of the results of this work and those of previous studies

All the studies listed in Table 4 represent the latest works that have used the PIMA dataset, as discussed in Sect. 2. A notable disadvantage of those studies was that they did not balance the dataset class. By contrast, the current work focussed on solving the problem by using SMOTE, as mentioned in the previous sections. The accuracy obtained in this work was higher than those of previous studies, which can be attributed to the tuning of hyperparameters with all features using the AdaBoost-ET model, achieving an accuracy of 92%.

6 Conclusion

This study presented hyper-ML and traditional ML classification models that are appropriate for electronic diagnostic systems that can be adopted in the IoMT environment. The present work provided a highly explainable and accurate ensemble model by using SMOTE and pre-processing evaluation. In addition, the feature selection applied to the PIDD maintained an acceptable standard for diabetes detection. The dataset was evaluated against different recent works in the literature to comprehensively evaluate the proposed fine-tuned classifiers. The comparative results are given in Table 4. In particular, state-of-the-art methods were compared with the proposed grid-tuned hyperparameter model based on four single classifiers (RF, DT, ET and LGBM) and four hyper classifiers. All single classifiers were fused with the ensemble AdaBoost model for diabetes diagnosis. The comparative study focused on determining the precision, recall, F1 score, accuracy and AUC metrics for the PIDD dataset. Table 4 demonstrates the importance and impact of pre-processing efficiency on classification performance. Then, on the basis of the experimental results and algorithm evaluation, this study searched for outliers in the original training data. With the aim of minimising the negative effect caused by outliers, the data were initially normalised. Furthermore, as the dataset was not particularly large, the unnecessary removal of rows was avoided. The PIDD contained an imbalanced distribution of classes, indicating significantly more data points for nondiabetic patients (majority class) than diabetic patients (minority class). Therefore, SMOTE was used to overcome the imbalanced problem and subsequently balance the negative and positive classes. In identifying the most important and relevant features for the PIMA classification problem, a hyper feature selection method was used by combining RFE and RF based on feature importance to save training time. Finally, a comparative analysis was performed between the classifiers used for analysis in three steps. (1) The ET model was used to determine the best evaluation result for the original data; it was applied before pre-processing. Its accuracy was 77% for eight features and a total of 768 health records. (2) AdaBoost-ET was selected as the best model with an accuracy of 85% after applying SMOTE to balance the dataset with five features (glucose, BP, BMI, DPI and age). These features were selected on the basis of hyper feature selection via RFE and RF. AdaBoost-RF obtained the best result for the second group of three features (glucose, BMI and age), achieving an accuracy of 78%. (3) A grid search was conducted to tune the parameters and consequently achieve optimal performance. The optimal performance was obtained on the basis of tuning the hyperparameters without feature selection. AdaBoost-ET, with an accuracy of 92%, outperformed the other models.

Although the study offers several strengths, it also has some limitations, thus opening new opportunities for future work. One of the limitations is the use of a single dataset (i.e. PIDD), which may not be representative of other populations. Future studies may validate the model by using datasets from other populations to test the model’s generalizability. Another limitation is the focus on laboratory findings to diagnose type 2 diabetes. Other factors, such as lifestyle and genetic factors, also contribute to the development of the disease. Future work may integrate additional data sources, such as medical history, dietary intake and physical activity levels, to improve the accuracy of the model.