Abstract

Background. Coronavirus disease (COVID-19) is an infectious illness that spreads widely over a short period of time and finally causes a pandemic. Unfortunately, the lack of radiologists, improper COVID-19 diagnosing procedures, and insufficient medical supplies have all played roles in these devastating losses of life. Deep learning (DL) could be used to detect and classify COVID-19 for potential image-based diagnosis. Materials and Methods. This paper proposes an improved deep convolutional neural network (IDConv-Net) to detect and classify COVID-19 using X-ray and computed tomography (CT) images. Before the training phase, preprocessing methods such as filtering, data normalization, classification variable encoding, and data augmentation were used in conjunction with the proposed IDConv-Net to increase the effectiveness of the detection and classification processes. To extract essential features, deep CNN is then employed. As a result, the suggested model can identify patterns and relationships crucial to the image classification task, resulting in more precise and useful diagnoses. Python and Keras (with TensorFlow as a backend) were used to carry out the experiment. Results. The proposed IDConv-Net was tested using chest X-rays and CT images collected from hospitals in Sao Paulo, Brazil, and online databases. After evaluating the model, the proposed IDConv-Net achieved an accuracy of 99.53% and 98.41% in training and testing for CT images and 97.49% and 96.99% in training and testing for X-ray images, respectively. Further, the area under the curve (AUC) value is 0.954 and 0.996 for X-ray and CT images, respectively, indicating the excellent performance of the proposed model. Conclusion. The findings of our proposed IDConv-Net model confirm that the model outperformed compared to existing COVID-19 detection and classification models. The IDConv-Net outperforms current state-of-the-art models by 2.25% for X-rays and 2.81% for CT images. Additionally, the IDConv-Net training approach is significantly quicker than the current transfer learning models.

1. Introduction

The worldwide outbreak of the coronavirus disease (COVID-19) is still wreaking havoc on people’s lives and health [1]. COVID-19 is a highly infectious disease with limited and less effective treatment options [2]. The transmission of COVID-19 occurs through respiratory droplets released when an infected individual talks, coughs, or sneezes due to infection with the SARS-CoV-2 virus. The virus can also be spread by contacting the mouth, nose, or eyes after touching a surface or object that has been exposed to the virus [3]. Numerous COVID-19 patients frequently overburden the healthcare systems in many countries. About 347.49 million/5.60 million patients have been diagnosed/died with COVID-19 infection since December 2019. The incidence of illnesses and deaths due to COVID-19 is increasing each day. According to a report [4] on 12 September 2022, a total of instances of COVID-19 have been reported at 613,958,298 in the world. Among them, 6,516,913 have died, and 592,777,665 have recovered. Furthermore, over 1,075,668 out of 97,095,092 patients have died in America; 684,914 out of 34,574,765 patients have died in Brazil; 528,165 out of 44,500,580 patients have died in India; 29,334 out of 2,014,887 patients have died in Bangladesh, etc. [4].

Generally, late detection of COVID-19 can assault the lungs and harm the tissues of the disease-ridden patient [5]. The lungs and human respiratory system are still particularly susceptible organs where the COVID-19 virus can easily proliferate. Damage results and the air sac is filled with liquid and expelled as an outcome [6, 7]. As a consequence, the patient has trouble breathing with oxygen. So, we want to rapidly and precisely determine the degree of lung injury to survive the patients and reduce fatality [8]. Moreover, early COVID-19 detection can save the patient’s life and stop spreading. A significant level of protection should be offered by a parenteral COVID-19 vaccine approach capable of inducing a potent, long-lasting immune response involving neutralizing antibodies and T cells [9, 10]. Different vaccine platforms and strategies have advantages and disadvantages from an immunological perspective. As a result, the COVID-19 vaccine has significantly changed the pandemic’s trajectory and reduced the rate of mortality [9, 11].

One of the diagnostic methods used for detecting COVID-19 is real-time reverse transcription polymerase chain reaction (RT-PCR), which is a recommended technique by the WHO for identifying the presence of the virus causing COVID-19 [12]. However, the RT-PCR method takes a few hours to two days to produce test results. Additionally, this technique is difficult, expensive, manual, and unavailable everywhere. The expense and lack of RT-PCR affect many developing and underdeveloped nations [13]. Further, RT-PCR testing needs a laboratory kit; many nations find it difficult to produce or gather during the outbreak [14]. Moreover, the COVID-19 RT-PCR test’s reduced sensitivity was noted in several investigations. Many researchers have reported this test’s sensitivity to 71% to 98%, which reduces the detection accuracy of COVID-19 cases [15].

Another approach is medical imaging which plays a critical role in COVID-19 detection and management. Specifically, chest X-rays and computed tomography (CT) scans have been used to detect and monitor COVID-19 patients. Medical imaging, such as chest X-rays and CT scans, can be helpful in detecting COVID-19 for several reasons, including the visualization of lung abnormalities, the confirmation of the diagnosis, the severity assessment, and the monitoring of disease progression [16]. Specifically, medical imaging is used by radiologists to verify the COVID-19 diagnosis manually. However, as radiologists must manually diagnose a significant number of COVID-19 patients, it is a laborious, error-prone, and exhausting process that necessitates competent radiologists [17].

Over the years, artificial intelligence (AI) has shown potential in the field of medical imaging. Deep learning (DL) is an effective tool for analyzing medical imaging data because it can automatically identify patterns and features from large datasets without requiring manual feature engineering [18]. Furthermore, DL has the potential to improve the speed, accuracy, and accessibility of COVID-19 diagnosis, which can help to control the spread of the virus better and improve patient outcomes. There has been a significant amount of research on the use of medical imaging for COVID-19 detection. Several researchers have applied machine learning (ML) and DL methods, such as convolutional neural networks (CNNs), transfer learning (TL), autoencoders, and ensemble, to medical imaging for COVID-19 detection. Further, these ML and DL methods have shown promise in COVID-19 detection using medical imaging [19]. Moreover, deep CNNs have demonstrated potential in COVID-19 detection and classification using medical imaging due to their ability to automatically learn hierarchical features from the input images. Furthermore, CNNs are designed to minimize noise and variation in the input images. Additionally, it allows the network to leverage knowledge learned from a large and diverse dataset, which can improve performance on the target task. However, in some research on COVID-19, lung cancer, monkeypox, brain stroke, etc., detection and classification were performed using CNN with insufficient accuracy [6, 20, 21]. Additionally, in some cases, CNN and transfer learning require a longer training time in the detection and classification.

After considering these issues, we require a compatible deep learning framework that will be able to help consultants and healthcare staff quickly and correctly identify COVID-19 disease from X-ray and CT images [22]. This research is aimed at demonstrating an enhanced deep convolutional neural network-based solution for automatic COVID-19 detection from chest X-rays and CT images. The COVID-19 radiography dataset that is publicly available is limited, and a large dataset is preferred to train deep CNN models. Even after using overfitting mitigation techniques, training DL models on a small dataset can result in overfitting. One of the most critical issues when designing the architecture is limiting the number of trainable parameters to avoid overfitting. An early call-back function can be employed to avoid overfitting. Further, data augmentation can also be used to address problems with small datasets. While developing deep learning models, overcoming the vanishing gradient problem is crucial. Additionally, the problem of accuracy degradation during deeper network training needs to be addressed.

This paper’s primary contributions include the following: (i)We propose an improved IDConv-Net model that can effectively and precisely differentiate between patients with normal chest conditions and those with COVID-19 by utilizing both chest X-rays and CT images(ii)Moreover, the novelty of the IDConv-Net model is that it can detect and classify disease from different modalities of images with higher accuracy. The model training approach time is very quick due to the use of fewer layers(iii)Furthermore, to evaluate the performance of IDConv-Net, we run the model using X-ray and CT images from which the best classification rate is achieved when compared to other existing detection models. In addition, our model provides a lower miss detection which indicates that the model is more reliable in detecting COVID-19 even though the dataset is entirely unknown(iv)Lastly, the experimental findings support the notion that the suggested IDConv-Net model outperformed previous state-of-the-art approaches to COVID-19 detection and classification. Additionally, the proposed IDConv-Net model correctly detects COVID-19 in both datasets, such as chest X-rays and CT images

The subsequent sections of this paper are organized as follows: Section 2 provides a literature review of the study, while Section 3 describes the method used and the required materials in detail. Our proposed IDConv-Net model is elaborated on in Section 4, while Section 5 presents the potential training parameters employed in our model. Section 6 of this paper presents the study’s findings, while Section 7 provides a discussion of these results. The paper concludes with a summary in Section 8.

To stop the COVID-19 pandemic from spreading, it is essential to identify the virus quickly and precisely. Chest X-ray and CT images are available in almost all hospitals worldwide and are the most widely used and economically advantageous medical imaging technology for evaluating lung problems [22, 23]. Chest X-ray and CT scans can reliably identify lung injury in COVID-19 patients earlier [9, 24]. It identifies the virus’s stage while indicating its presence [16]. However, the lack of distinctive characteristics and the resemblances between lung lesions and other viral diseases make COVID-19 susceptible to misdiagnosis [25]. Considering these, potential AI appliances such as ML and DL can overcome the COVID-19 disease detection errors caused by people from X-ray and CT imaging techniques [2628]. AI has proven its efficiency and performance in detecting diseases like cancer, tumor, pneumonia, and COVID-19. DL-based approach, such as CNN, plays a key role in processing medical images, particularly in features extracting and classifying [29]. In [30], Bassi and Attux developed a dense CNN to classify COVID-19, pneumonia, and normal from chest X-rays. They proposed a novel approach of output neurons that modifies the twice-transfer learning techniques. They achieved a good performance from their model regarding the classification of COVID-19. However, larger datasets and clinical investigations were required to guarantee accurate generalization. Agrawal and Choudhary [20] suggested a deep CNN for detecting COVID-19, utilizing two datasets of chest X-rays. For image segmentation, they used an encoder-decoder architecture. The CNN structure encoder extracts features and transfers them to the decoder as part of the segmentation procedure for this experiment. The findings demonstrated that, for two datasets used in COVID-19 detection, the suggested model achieved high accuracy of 94.4% and 95.2%, respectively.

The authors [31] introduced a novel COVID-CXNet utilizing a familiar transfer learning-based CheXNet model. They utilized relevant and meaningful features in the detection of novel coronavirus. A CNN-long short-time memory (CNN-LSTM) model was designed by Purohit et al. [32] to extract features from raw data hierarchically. They employed several COVID-19 chest X-ray datasets to test and investigate the model accomplishment of COVID-19 detection. For the larger dataset, however, the model needs to be trained for longer, which needs to be cut down.

Ayalew et al. [17] presented the DCCNet model for the diagnosis of COVID-19 patients. The authors employed two methods, namely, histograms of oriented gradients (HOG) and CNN, to extract features from input images, and used a support vector machine (SVM) classifier to classify COVID-19. The SVM classifier yielded 99.97% accuracy during training and 99.67% accuracy during testing when combined with CNN and HOG-based features.

Indumathi et al. [33] mentioned an ML algorithm to classify and predict COVID-19-affected zone. From March to July 2020, they used the Virudhunagar district’s COVID-19 dataset. They achieved a 98.06% accuracy rate, which was higher than the 95.22% accuracy rate of the C5.0 algorithm.

Salau [34] used an SVM technique to classify and identify COVID-19 using chest CT data. After extracting features from CT scans using a discrete wavelet transform (DWT) technique, the study built a classification model. The findings demonstrated that the suggested model has a high accuracy of 98.2% in COVID-19 detection.

Chaunzwa et al. [35] used a DL framework to detect lung cancer from CT images. In [36], the identification of COVID-19 on CT scans is accomplished using ML methods. However, their investigation only used images that cost 150 for CT scans. Khan et al. [37] highlight promising DL research for understanding radiography pictures and progressing the investigation of constructing specific DL-based assessment methods for unique COVID-19 variations, delta, omicron, and challenges ahead. In [6, 21], to identify COVID-19, the authors implemented an SVM technique. Using 208 test data, they achieved a lower recognition rate. In [38], the authors applied ML techniques to detect COVID-19 automatically using X-ray images to enhance accuracy. Rahimzadeh and Attar [39] considered Xception and ResNet50V2 approaches for COVID-19 identification from X-ray images. In [40], the researchers employed pretrained transfer learning models, such as ResNetV2, InceptionV3, and ResNet50, for detecting lung disease and COVID-19 using X-ray images. In [40], COVID-19 was identified using just X-ray data by CNN models like Inception-ResNetV2, ResNet50, and InceptionV3, where the models had 98%, 97%, and 87% classification accuracy, respectively. These experiments used a few X-ray data. They may have tested their models’ performance on other modalities, such as CT scans.

However, some problems with past research include insufficient detection accuracy for different modalities of images, small datasets, overfitting issues, and using CNN without first preprocessing images. Further, some works require prolonged training time, which is another drawback. This study used a number of image preprocessing approaches to address these drawbacks. Furthermore, the proposed enhanced model improves detection and classification performance and reduces the training approach time.

3. Materials and Methods

This section elaborates on the methodology provided for identifying COVID-19. Figure 1 depicts the process of the proposed methodology.

3.1. Image Data Acquisition

A dataset is the backbone of research. We used two types of images, 2D CT and X-ray images. For CT images, the settings for the 64-slice scanner were calibrated with the following parameters: collimation of either or , tube voltage of 120 kilovolts (kV), section thickness of five millimeters (mm), slice interval of five millimeters (mm), the pitch of 1.375 mm, and field of view of three hundred fifty by three hundred fifty (mm). In addition, the patient’s position was supine; both arms were elevated, and the patient was instructed to hold their breath. The datasets were reconstructed with a wall thickness and increment ranging between 1.5 and 2 mm [41].

We collected 1,252 COVID-19 positive and 1,230 normal images from the SARS-CoV-2 CT-scan dataset which images were gathered from real patients in hospitals from Sao Paulo, Brazil.

Our COVID-19 CT scan image dataset consisted of 7,593 COVID-19 images obtained from 466 patients, as well as 6,893 normal images obtained from 604 patients. Then, merge CT images of both datasets to create a new 2D-CT dataset containing 8,845 COVID-19 and 6,893 normal images, totaling 17,168 images. Similarly, we collected 576 COVID-19 positive and 1,583 normal X-ray images from COVID-19 X-ray dataset and 4273 COVID-19 positive images and 10192 normal X-ray images from the COVID-19 Radiography Database and then merged them to create an enlarged size of the new X-ray dataset. We used a new merged X-ray and 2D-CT image dataset to perform our model better. Overall, the merged X-ray dataset contains 4,192 positive COVID-19 and 11,775 normal images, and the merged 2D-CT dataset consists of 8,845 COVID-19 positive and 8,123 normal CT images. We represent some sample CT and X-ray images in Figures 2 and 3. In Table 1, we highlighted the number of images extracted from the sources dataset. The datasets were partitioned into training, validation, and testing sets, as shown in Table 2. For each dataset, 80% of the images were allocated for training, 10% for validation, and 10% for testing purposes.

3.2. Data Preprocessing

Preprocessing is crucial in transforming raw data into a format appropriate for the ML or DL approach. It primarily enhances the source images by controlling normalization, multicollinearity, scaling, shuffle, and data division [42]. Furthermore, preprocessing methods enhance the image quality, making an experiment more successful. Moreover, it is very difficult to handle high dimensions of input data. Sometimes, it may cause overfitting and poor results. For this reason, we downsized the images to . We applied the dimensionality reduction technique for less computational time and quick visualization. Before training the model, it is essential to convert string or nonnumeric features into numeric ones. So, we utilized data transformation for data compatibility, which means converting string or non-numeric features into numeric [4345]. We also use feature engineering, which entails selecting the features that would be helpful in training a model. The normalizing technique was utilized to compare various features on a comparable scale. As a result, we can use higher learning rates or models to converge more quickly for a given learning rate. It also helps to stabilize the gradient descent step.

3.3. Normalization of Data

The significance of data normalization for developing exact prescient models has been analyzed for the different ML algorithms that have recreated a crucial position [46]. The fundamental objective of data normalization is data quality assurance before its application to predictive analytics. Various data normalization techniques can be utilized, including min-max normalization, -score normalization, decimal scaling, and median standardization, among others [47, 48]. The prime aims of data normalization are given below: (i)This data group makes all entries and attributes appear identical(ii)It provides the dataset with relevant information that is more obvious and natural, reducing its size and simplifying its structure so that it is easier to identify, contrast, and retrieve(iii)It enhances and simplifies the numerical data without losing the critical characteristics with reduced complexity, leading to easy segmentation

The dataset can be normalized by dividing an image’s gray-scale value by 255. However, this study uses -score normalization as a normalized technique [49], which is stated as follows: where indicates the normalized weights of -score, is the weight of th row and th column, represents the mean, and represents the standard deviation, which can be expressed as Herein,

When manipulating data, the values are typically scaled into the [0–1] range, ensuring that the data is stored.

3.4. Working with Numerical Data

The most frequent data types are dealt with in DL methods through numerical values. For this reason, we had to maintain some procedures to get numerical values for each extracted feature. Furthermore, we apply normalized and standardized operations to get better processing to train models and support various DL networks. In our experiment, we converted the two levels of COVID-19 and non-COVID-19 to 0 and 1 using the LabelEncoder function from the Python standard module [50].

3.5. Data Augmentation

Data augmentation is a powerful and useful technique for improving machine learning models’ accuracy and predictive ability by increasing the number of images in a dataset through modified versions of existing training images. Moreover, it reduces the complexity of collecting more images to enlarge the dataset. Data augmentation utilizes techniques such as data wrapping and oversampling to increase the number of images in a dataset. Nevertheless, it may appear to be an overfitting problem in the results [51]. To mitigate this problem, we have applied flipping, rotation shearing, mirroring, zooming, fill mode, and channel shifting using principal component analysis to augment the data [52]. The augmentation parameters that we used to increase the number of images are given below:

Flipping: the image is horizontally and vertically flipped. The flipping operation reconfigures the pixels while preserving the image’s attributes. An image’s vertical and horizontal position is randomly adjusted by 0.2 degrees.

Rotation: the image is flipped by a number of degrees between 0 and 360. In the model, every flipped image will be different. The rotation range is from -360 to 360 degrees.

Shear: to produce or correct perception angles, the image can be twisted along a particular axis using a shear range of approximately 0.4 degrees.

Zoom: the zoom range of an image in the data augmentation method can be zoomed in or out. This method enlarges the image by zooming in or out randomly and adding pixels around the image. The extent of zoom is around 0.5 degrees.

Fill mode: to fill the empty pixel’s values, the default value “nearest” is applied, which replaces the nearest image pixels.

Channel shifting: it randomly shifts channel values to vary the hue by 10.

4. Our Proposed IDConv-Net Model

This section represents the proposed model and working outline. The proposed IDConv-Net model has five convolutional layers and four pooling layers (max pooling), batch normalization, rectified linear unit (ReLU), dense and dropout layers, and sigmoid. The architecture of the IDConv-Net model is shown in Figure 4, which consists of input images, feature extraction, and classification layers. Firstly, the feature extraction layer extracts the critical features from the input images; then, the last part of the model, such as the fully connected layer, performs classification. As a result, the model functions as a feature extractor before acting as a classifier.

4.1. Feature Extraction

Feature extraction is an important step in ML and Dl applications, as it can improve the efficiency, accuracy, and interpretability of the subsequent learning algorithms [53]. The feature extraction part of our model consists of five convolutional layers followed by four max-pooling layers through the ReLU activation layer (see Figure 5), while subsequent ReLU layers follow through the batch normalization layer and maximum pooling layer, and finally, the last convolution layer, followed by the flattening and dropout layer, as shown in Figure 4.

Apart from this, the input layer initially receives input images with the size of CT or X-ray chest image, where is the image’s dimension, and 3 is the RGB channel. The convolutional layer is responsible for featuring maps, i.e., feature representation, of the input images [54]. The input image is convoluted with a set of trainable weights, sometimes referred to as multidimensional filters , and the result is coupled with biases . Assume there are filters; this layer’s th output can be represented as given in the following equation:

where , , and , , are the height, width, and channel of the input, respectively. Further, , represents the local region of th row and th column of input image, where is the zero padding number and is the pixels of stride. Further, the batch normalization (BN) layer improves network training and lessens sensitivity to network initialization between the convolutional and activation unit (ReLU) layers [55]. In this paper, the ReLU activation function is applied, mathematically shown in Equation (6), and it only keeps the positive part of the activation.

Furthermore, the polling layer takes maximum values with a pool size of (2, 2). Consequently, the max-pooling layer pooled the feature maps with the dimension , followed by the second convolutional layer. Similarly, the second convolutional layer convolved feature maps, followed by a second pooling layer with a similar filter size of and a stride of 2. Consequently, the image’s dimension will be reduced to . Table 3 displays the proposed IDConv-Net model, which outlines its constituent layers, including their corresponding output sizes. The IDConv-Net model is comprised of five convolution layers, four activation layers, and four max-pooling layers. The resulting output features are then passed through a flatten layer, a dense layer, a dropout layer, and a sigmoid activation layer.

4.2. Classification

The classification layer is the final layer of the proposed model that produces the network’s output in the form of predefined categories or classes. The classification layer follows the feature extraction layer that extracts the high-level features from the image. The output of the feature extraction layer is sent to a flattened layer as the first step in classification, which converts the data’s form into a one-dimensional data vector. In the classification function, a dropout layer is followed by a thick layer with 1024 neurons. A dense layer produces the final output with two neurons and a sigmoid activation function, which identifies the image as belonging to one of the chest diseases: COVID-19 or the normal.

The classification layer in a CNN using the sigmoid activation function can be represented using the following equation: where is the output of the previous layer, is the weight matrix of the classification layer, is the bias vector, and is the sigmoid activation function defined as

The output of the final layer of the proposed model is passed through the sigmoid function to obtain a value between 0 and 1, which can be interpreted as the probability that the input image belongs to the positive class. The decision boundary can be set to 0.5, so if the output of the sigmoid function is greater than 0.5, the input image is classified as belonging to the positive class. If it is less than or equal to 0.5, the input image is classified as belonging to the negative class. Moreover, a dropout layer is utilized with a value of 0.3 for the last convolution layer to avoid overfitting between the training and testing performance.

Input: COVID-19 or Non-COVID-19 X-ray or CT images datasets D with resize image dimension (S).
Featured Vector using IDConv-Net = (Fv)
1: Initialize Fv > = Sp.p =1
2: characteristics extracted from each image D(p, 1, 570).
3: Fv(p, 1) = S(x, 1) + Fv(p, 1).
4: Fv = Total characteristics extracted by IDConv-Net.
5: Assign. Ho = Output of the hidden layer, Hd = Last hidden layer outcome.
6: Vt(p, 1) = Ho(p, 1) + Hd(p, 1).
7: Ft = Output of the IDConv-Net through a hidden and FC layer.
8: Training_feature (Ttrain_feature) = [Fv, Ft].
9: Test_image = imread(image).
10: Move to: step 1 and 2 to extract essential test features (Ttest_feature) from test set.
11: Outcome (i) = classify (Ttrain_feature, Ttest_feature).
12:Output: True for COVID-19 Positive or False for COVID-19 Negative.

5. Training and Performance

The performance of the training and testing set depends on the experiment setup and performance matrices such as precision, Recall, -score accuracy, sensitivity, and specificity. Experiment setup and performance matrices were described in this section.

5.1. Experiment Setup

Hyperparameter tuning is an important step in building machine learning models, as it involves selecting the optimal hyperparameters that result in the best performance of the model. To get excellent performance from the model, we repeatedly fine-tuned the model. We used three optimized hyperparameters during our study’s training phase: batch size, epochs, and learning rate. Manually tuning that parameter is time-consuming; therefore, we have applied the grid search method to select the best value of the hyperparameter. Table 4 summarizes the initial and optimal parameters found during the experiment. From Table 4, we can infer that the best-optimized batch size is 32, epochs are 50, and the learning rate is 0.001 for both datasets. We performed a grid search method using ML frameworks and libraries, such as scikit-learn in Python to obtain these values.

Adam, also known as adaptive momentum, is used to enhance the performance of our suggested IDConv-Net model because it performs consistently while categorizing binary images [56]. The experiment was conducted using an organization laptop with Windows 10, a Core i7 processor, and 16 GB of RAM. Furthermore, we run the model on a Jupiter laptop and the Google Colab GPU environment with 12 GB of RAM.

The proposed model was developed and fine-tuned using chest X-ray and CT image dataset to get insight into the COVID-19 identification issues. We split our dataset into three sections: training, validation, and testing to evaluate the performance of the IDConv-Net model. To assess the performance, we have used 80% data for the model training up, 10% data for model validation, and the rest 10% for model testing. Table 2 shows the data distribution for the training, validation, and testing sets, respectively, for a better understanding of both datasets.

5.2. Performance Metrics

Performance measures are crucial to assessing the proposed approach. In this study, we measured precision, recall, -score, and accuracy using four metrics: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). (i)TP. The experimental result for the COVID-19 patients is accurate. That means the model detects positive results for COVID-19-affected patients(ii)TN. The experimental result for the Non-COVID-19 patients is accurate. That means the model detects negative results for Non-COVID-19 affected patients(iii)FP. The experimental result is wrong for the Non- COVID-19 patients. That means the model detects positive results for Non-COVID-19 affected patients(iv)FN. The experimental result is wrong for the COVID-19 patients. That means the model detects negative results for COVID-19-affected patients

Each of these performance metrics is used to assess the performance described in the equation below:

Precision (P): it is comprehended as a positive predictive value. It measures the proportion of positive and projected instances out of the total number of cases that are expected to be positive.

Recall (R): it calculates the percentage of individuals who are truly positive out of the total number of individuals who are either positive or expected to be positive. It is commonly referred to as the actual positive rate.

-score (): precision and recall are used to determine -score. It is significant to the experiment because of indicating the test accuracy.

Accuracy (): the ratio is the number of right prediction cases separated by the total number of cases.

The value of all performance metrics ranges from 0 to 1.

6. Results

The study used secondary datasets of two modalities: X-rays and CT images. The experiment utilized 15967 X-ray images, where 14370 images were used for model training and validation, and the remaining 1597 images were used to evaluate the model. Similar to the first experiment, 15471 CT images were used for model training and validation, and the remaining 1697 images were used to evaluate the model. In this experiment, 17168 CT images were used.

Firstly, we ran the experiment five times to optimize the hyperparameters, including node size, batch size, learning rate, and drop rate. The optimized learning rate was 0.0001 and 0.001 for the X-ray and CT studies, respectively, and 0.99 momentum while training the model with the Adam optimizer using the binary cross-entropy loss. In our research, we have used 50 epochs. However, it was completed in 48 epochs for the CT images and 47 epochs for the X-ray images due to the early stopping function, which is accountable for terminating the execution when reaching an optimum result. Moreover, the complete trainable parameters in model training were 8,004,481 out of 8,088,129. Furthermore, the sigmoid activation function is used in the final layer since our model works as a binary classifier.

We utilized X-ray and CT images separately to evaluate the performance of the proposed IDConv-Net model. To evaluate the performance of IDConv-Net, firstly, we train the model with X-ray images. We used 80% of the data for model training and 10% for validation. The remaining 10% was used to evaluate the model’s performance. After evaluating the IDConv-Net model with X-ray images, we achieved an accuracy of 97.49% and 96.99% for training and testing, respectively (see Table 5). Furthermore, the model achieved a precision of 97.14%, recall of 91.87%, and -score of 94.43% from the X-ray image dataset. From the confusion matrix of the X-ray image dataset (see Table 6), only 12 out of 419 COVID-19 images are misidentified. Furthermore, only 36 out of 1178 normal images are miss identified.

In a different study, we used a CT scan image dataset to train the IDConv-Net model. In this study, 80% of the data were utilized for model training, and 10% were used for model validation. The performance of the model was assessed using the final 10%. We achieved an accuracy of 99.53% and 98.41% for training and testing, respectively, after evaluating the IDConv-Net model on CT images. The findings of IDConv-Net are compared with other state-of-the-art methods shown in Table 7, where the suggested model achieved a precision of 98.64%, recall of 96.31%, -score 98.48%, training accuracy of 99.53%, and testing accuracy 98.41%. From the confusion matrix of the CT image dataset (see Table 8), only 12 out of 885 COVID-19 images are miss detected, whereas 15 out of 812 normal images are miss detected. The accuracy and confusion matrix proves the model classification reliability even with an entirely new data set.

Finally, we can infer that our proposed model can accurately classify COVID-19 and normal patients from X-ray and CT image datasets. The proposed model obtained a better accuracy with a bit of loss, which is shown in Figures 6 and 7. In another study, the model outperformed the existing models with a little loss on CT images, as shown in Figures 8 and 9.

Moreover, the area under the curve (AUC) summarizes the receiver operating characteristics (ROC) curve demonstrating the classifier’s ability to distinguish between classes. The horizontal axis (-axis) represents the false positive rate (FTR), and the vertical axis (-axis) represents the true positive rate (TPR). The AUC-ROC value is an indicator of the detection performance of the model, with a higher value indicating better performance. The AUC-ROC 0.954 and 0.966 have been achieved simultaneously from our proposed model using X-ray and CT image datasets shown in Figures 10 and 11, respectively. The results of our study indicate that training time for a deep learning model is an important consideration for detecting and classifying COVID-19. Based on the data presented in Table 9, it can be observed that our proposed IDConv-Net model exhibited significantly reduced training times compared to other transfer learning models. Specifically, the training time for the X-ray image dataset was only minutes, while the training time for the CT image dataset was minutes. These training times were substantially lower than those observed in existing models, which took double as long. Therefore, our IDConv-Net model can be considered a highly efficient and effective approach for image classification tasks. We have also demonstrated the random prediction outcomes of test images using our suggested IDConv-Net in Figures 12 and 13. In this direction, we evaluated the identification accuracy by comparing the actual and predicted test images with a confidence level.

7. Discussion

In this study, 2D-CT and X-ray images were used. The 2D approach is slice-based, using a single slice image as input to produce a score for each individual. As opposed to this, 3D is a volume-based technique that uses the entire volume (a sequence of slices) as its input to produce a single patient score. However, 2D is still trustworthy for inspecting important areas of images and for complex geometry. Moreover, we applied preliminary filtration to all chest images on the train set to control quality and remove incomprehensible slices for processing the chest CT images. Before being approved to train in the IDConv-Net model, two expert physicians graded the diagnosis for the images. A third expert evaluated the evaluation set to ensure no grading errors. Furthermore, we applied the generalization technique to enhance the model performance. The grid-search approach was utilized to identify the optimum hyperparameters. We have chosen the local minima by defining a set of discrete values. The objective function is then evaluated at its grid point by inputting the appropriate parameter values following that. Subsequently, the local minimum can be identified as the lowest objective function value grid point. The minimum and maximum values for each prior were also employed; these parameters were determined empirically according to the characteristics of the images and the number of instances. The parameter settings used in this study are shown in Table 4 with the initial and optimized parameters.

After training, the model can process new data and estimate accurate predictions. In addition, we also used some techniques like data feature extraction preprocessing to get an accurate classification.

We can see the results of the comparison of the proposed model with the state-of-the-art model in Table 5, where we obtained such excellent results due to applying some image preprocessing techniques like noise removal, filtering, data transformation, and feature extraction. Furthermore, our model is quicker because we use fewer layers than the state-of-the-art model. Figure 6 represents the training and validation accuracy of the proposed model during the training. Figure 7 shows the training and validation loss during the model training. From these curves, we can infer that there is no overfitting and exhibits a good model performance. Although we have used 50 epochs, the model terminates execution after 47 epochs for X-ray and 48 epochs for CT images due to the early stopping function.

Similarly, for the CT scan study, Table 7 highlights that our proposed model achieved an excellent performance compared to state-of-the-art models. Moreover, Figure 8 shows the accuracy for the training and validation sets during the model build-up. Similarly, Figure 9 indicates the loss of train and validation sets during the model train. After evaluating the model, we got excellent accuracy and loss curves which indicates the model’s good performance.

The study’s most significant part was increasing the accuracy level of detection and classification. It is also possible that the goal is to obtain accuracy as close to 100% as possible because, even in a few cases, misdiagnosis is not worth it.

Although similar models of CNN (e.g., AlexNet, nCOVnet, MobileNetV2, and ResNetV2) could detect COVID-19 with insufficient accuracy, moreover, more hidden layers of these models consume more time to yield results. In addition, these models increase the complexity of providing the detection results. Our proposed IDConv-Net model has great significance as a binary classifier. Firstly, it works as a feature extractor, then as a classifier. Moreover, the model is more flexible, less complex, and consumes less time due to fewer hidden layers.

In medical imaging, evaluating ML and DL models used for COVID-19 detection commonly employs the AUC-ROC performance metric, in addition to accuracy. The ROC curve is a graphical representation of the sensitivity and specificity of a binary classifier, and the AUC-ROC quantifies the model’s ability to distinguish between positive and negative classes in a binary classification task. Figures 10 and 11 represent the AUC-ROC of the proposed model for X-ray and CT images. The proposed model obtained 0.954 and 0.966 AUC-ROC for X-ray and CT images, respectively. A high AUC-ROC value indicates that the model has good discrimination between COVID-19 positive and negative cases, with high sensitivity and specificity. However, its use should be combined with other performance metrics and clinical validation to ensure that models are effective and safe for clinical practice.

Overall, the proposed IDConv-Net provides effective results individually on the X-ray and CT images. Finally, according to Tables 5 and 7, our suggested IDConv-Net model achieved the best accuracy for the X-ray and CT image datasets, respectively. Moreover, to avoid overfitting, we used dropout with a value of 0.3 in the last convolution layer of the proposed model. Furthermore, we used an early stopping function during the training of our proposed model to ensure that the model is not overfitted. Thus, the model is good and reliable for detecting COVID-19 in an unknown dataset. We also performed a qualitative analysis where the proposed IDConv-Net achieved a high prediction outcomes rate with a confidence level ranging from 95 to 99+ on the testing set, indicating that it can accurately classify COVID-19 using both X-ray and CT images. Figures 12 and 13 illustrate the actual and predicted outcomes with a confidence level of identification for X-ray and CT images, respectively. These results suggest that a deep CNN model can be an effective tool for COVID-19 diagnosis and potentially assist healthcare professionals in detecting and treating the virus. Moreover, the results of our study demonstrate that our proposed model can detect and classify COVID-19 in a relatively short time frame. As shown in Table 9, our proposed model achieved comparable outcomes to a transfer learning model while requiring less training time across different image modalities. The reduced training time of our proposed model can be attributed to several factors, including the use of fewer layers in the model architecture and the implementation of enhanced preprocessing techniques. By using a more streamlined model architecture, we were able to reduce the computational demands of the training process while still achieving high levels of performance. Therefore, we can infer that the proposed model works appropriately for both datasets and acquires better accuracy than state-of-the-art detection and classification models. Additionally, DL models’ predictions could have been understood and interpreted with the use of a collection of tools and frameworks called explainable AI (XAI). Furthermore, XAI develops a set of ML techniques that produce more understandable models while preserving high performance (prediction accuracy) and enabling human users to comprehend, properly trust, and manage the new breed of AI partners. Another solution to prevent COVID-19 is wearing a face mask and practicing regular hand washing. These are two important measures that effectively reduce the spread of COVID-19. Low-cost sensor-based hand washing techniques can contribute to reducing the spread of COVID-19. However, these measures are most effective when combined with other prevention strategies, such as social distancing and avoiding large gatherings [71, 72].

The advantage of the study is the proposed model consists of fewer layers than other detection models. As a result, it reduces complexity and training time due to a lower layer than other models. Another advantage of the model is that it can detect and classify both data types with higher accuracy. The most vital advantage of the model is that it does not contain overfitting in both datasets’ training and testing results. In addition, the following advantages of the model can increase its accuracy if we use balanced datasets. In contrast, the drawbacks of the study are that the model yields less accuracy for X-ray images than CT images due to poor resolution and bony structure of chest scan. However, it can be overcome using high-resolution X-ray images. Another drawback of the model might reduce accuracy if we use imbalanced datasets. The other drawback is that some slices among hundreds of pieces do not contain disease features. These slices are taken from the chest scan’s superior/upper, middle, or inferior/lower part. As a result, the model sometimes provides a minor misclassification for COVID-19.

8. Conclusion and Recommendation

COVID-19 poses a severe threat to all living things in the world. A new variant of COVID-19 (e.g., Omicron) will be dangerous and deadly if it mutates with delta or another lethal variant and then spreads quickly worldwide. As a result, early detection of COVID-19 can protect against its spread by isolating affected people. For this purpose, our proposed IDConv-Net can compensate by detecting and classifying COVID-19 at an early stage. Our proposed IDConv-Net model achieves a training accuracy of 99.53% and a testing accuracy of 98.41% for CT images. On the other hand, the IDConv-Net model also achieves a training accuracy of 97.49% and a testing accuracy of 96.99% for X-ray images. Furthermore, our suggested IDConv-Net model outperforms previous COVID-19 detection and classification models that are currently available. Additionally, our proposed model requires less training time than existing models to detect and classify COVID-19.

Overall, while the proposed model has shown great promise in medical imaging applications, several challenges still need to be addressed to make them more effective and practical for use in real-world settings. The model is considered black-box, meaning it can be difficult to understand how they make their predictions. In the future, we plan to use Grad-CAM and XAI to make the model more comprehensive and user-friendly for disease diagnosis.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this article.

Authors’ Contributions

Md Khairul Islam was responsible for the conceptualization, analysis, data curation, methodology, software, validation, writing—original draft preparation, reviewing, and editing. Md Mahbubur Rahman was responsible for the supervision and writing—reviewing and editing. Md Shahin Ali was responsible for the data curation, software, and writing—reviewing and editing. Md Sipon Miah was responsible for the supervision and writing—reviewing and editing. Md Habibur Rahman was responsible for the supervision and writing—reviewing and editing.

Acknowledgments

We would like to acknowledge the support provided by the Bio-Imaging Research Lab, Department of Biomedical Engineering, Islamic University, Kushtia 7003, Bangladesh, in carrying out our research successfully.