1 Introduction

Earthquakes pose perilous threats to human life and property, ranking among the most devastating natural disasters humanity faces [1, 2]. In densely populated urban regions, the risk of building damage or collapse during seismic events reaches an exceedingly grave magnitude [3]. Following an earthquake, the rapid and meticulous assessment of structural harm becomes paramount for effective emergency response, rescue operations, and subsequent reconstruction efforts [4]. In recent times, remote sensing technologies have emerged as invaluable instruments for discerning, detecting, and evaluating natural calamities, harnessing diverse data modalities including aerial or satellite images, Lidar, and SAR [5,6,7]. Thus, the precise classification of distinct forms of building impairment through the scrutiny of remote sensing imagery has become an exigent concern [8].

Various models for construction damage detection have been proposed in the literature. Some of these models are discussed below. Roy and Bhaduri [9] utilized DenseNet and Swin-Transformer for damage detection. The proposed DenseSPH-YOLOv5 model aimed to improve the accuracy and efficiency of damage detection in engineering informatics, using a large-scale road damage dataset (9053 road damage images) and achieving a precision of 89.51%. Seemab et al. [10] presented a method for detecting propagating cracks in reinforced concrete beams using digital image correlation measurements, providing effective crack detection. Zhu and Tang [11] proposed an automated hydraulic structures damage detection approach using drones, offering profound insights into maintenance and repairs by leveraging drone imagery and cutting-edge artificial intelligence techniques. A literature review regarding damage detection is given in Table 1.

Table 1 Literature review based on damage detection

It may be noted from the above table that, the research gaps are as follows:

  • Explainable models are not much explored.

  • The CNNs employed for such works are either customized versions of VGG or 1D-CNN. Additionally, while YOLO is a prevalent tool used for damage detection, there is a shortage of original CNN models with attention mechanisms.

  • The two class datasets: (i) damaged and (ii) non-damaged have been used.

1.1 Motivation and our model

The year 2023 witnessed a series of catastrophic earthquakes in Turkiye, resulting in the deaths of over 50,000 people and rendering millions homeless. To alleviate the suffering of those affected by this tragedy, the foremost priority is to detect the extent of damage caused. However, this task can be a tedious and time-consuming process. To address this issue, we propose an automated damage detection model and have curated a novel dataset comprising five classes, namely (i) Debris, (ii) Damaged Buildings, (iii) Non-Damaged Buildings, (iv) Damaged Highways, and (v) Non-Damaged Highways.

To achieve high classification performance while retaining a lightweight deep learning model, we have modified the popular deep learning model, MobileNetV2 [26]. Since the introduction of vision transformers (ViT) [27], we have observed that attention blocks are efficient in achieving high classification performance. Therefore, we have added pooling-based attention blocks to MobileNetV2 to enhance its classification performance. Moreover, we have used the ConvNeXt strategy to modify MobileNetV2 blocks and we have obtained a more lightweight model. This modification has given rise to AttentionPoolMobileNeXt.

We have proposed a pyramidal deep feature engineering (DFE) model by utilizing the presented AttentionPoolMobileNeXt CNN. Firstly, we have trained the training dataset utilized by the proposed CNN. After that, we used the dropout layer of this pretrained AttentionPoolMobileNeXt as a deep feature extractor and this layer generated 256 features. We have utilized four leveled multilevel discrete wavelet transform (MDWT) [28] approximation to generate wavelet filters and we have extracted features from the raw image, low-low pass filter band (LL band), and low-high pass filter band (LH band). Subsequently, we have employed iterative neighborhood component analysis (INCA) [29] as a feature selector to identify the most relevant features, which have been classified using a support vector machine (SVM) [30, 31] classifier. Our proposed AttentionPoolMobileNeXt has achieved a testing accuracy of 90.17%, demonstrating the effectiveness of the cooperation between deep learning and feature engineering. Furthermore, our presented MDWT and AttentionPoolMobileNeXt deep learning models have attained a classification accuracy of 97%.

We have tried to address the literature gaps by introducing the following:

  • Introduced a novel deep CNN model.

  • Attained interpretable results from the proposed attention model to concentrate on regions of interest.

  • Compiled a new image dataset comprising five distinct classes.

1.2 Innovations and contributions

The innovations and contributions of our work are given below:

  • Novelties:

  • Collected a novel image dataset dedicated to construction damage detection, addressing the need for specialized data in this domain (automated damage detection).

  • Introduced an attention-based CNN architecture named AttentionPoolMobileNeXt, emphasizing its unique design for improved performance.

  • Presented a comprehensive deep feature engineering model by synergizing AttentionPoolMobileNeXt, the MDWT-based approximation inspired by watermarking methods, and machine learning algorithms such as INCA and SVM.

  • Contributions:

  • Proposed a novel automated damage detection model for accurate classification of five classes.

  • Developed a lightweight deep learning model designed for rapid and accurate classification of damages, aiming to address the need for efficiency in real-world scenarios.

  • Achieved an exceptional test classification accuracy of over 90% on the curated dataset, demonstrating the efficacy and reliability of the proposed approaches.

Our work contributed to the field of construction damage detection and classification. By introducing innovative combinations of attention-based CNN architecture, MDWT-based feature extraction, and machine learning algorithms, our work not only addresses the challenges associated with existing methods but also provides comprehensive technical details, emphasizing both methodological and empirical advancements.

2 Dataset

To evaluate our model's performance, we collected a novel image dataset consisting of five distinct classes [32,33,34,35,36]. The dataset's classes are as follows: (1) Debris, (2) Damaged building, (3) Damaged highway, (4) Non-damaged building, and (5) Non-damaged highway. Furthermore, we partitioned the dataset into training and testing subsets. The images were stored in either JPEG or JPG format and had varying dimensions. The dataset's attributes are presented in Table 2.

Table 2 Characteristics of the collected image dataset for construction damage classification

As can be observed from Table 2, the construction damage image dataset is inherently imbalanced.

The collected dataset is publicly available on Kaggle, and researchers can download it using the following URL: https://www.kaggle.com/datasets/turkertuncer/damaged-constructions-image-dataset.

3 The proposed AttentionPoolMobileNeXt

Our essential objective is to present a novel lightweight deep learning model, termed AttentionPoolMobileNeXt. To achieve this, we have expanded upon the MobileNetV2 architecture by incorporating two attention blocks, drawing inspiration from PoolFormer [37]. Additionally, we have made enhancements to the MobileNetV2 blocks akin to ConvNeXt, utilizing combinations of convolution + normalization and convolution + activation. To provide a comprehensive understanding of our proposed model, we initially provide a succinct overview of the MobileNetV2 architecture. We have used the attention layer to focus region of interest and we have used a ConvNeXt-based strategy to extract a meaningful feature map with less number of learnable parameters than MobileNetV2.

MobileNetV2 leverages convolutions, linear bottlenecks, and inverted residuals [26], employing 1 × 1 and 3 × 3 convolutions for extracting image features. Feature transformation involves bottleneck inputs, 1 × 1 convolution, and depth-wise convolution with 3 × 3 filter sizes. While the architecture integrates addition-based shortcuts reminiscent of Residual Networks to address the vanishing gradient problem, it lacks attention blocks. To address this limitation and enhance classification performance, we have introduced two attention blocks to the MobileNetV2 architecture. Our selection of the MobileNetV2 architecture stems from its efficiency, with ongoing research efforts focusing on the development of MobileNetV3 [38].

In this study, we extended the MobileNetV2 architecture by incorporating attention blocks, resulting in the development of AttentionPoolMobileNeXt. This lightweight deep-learning model is designed to enhance classification performance. Our proposed architecture encompasses convolution, residual mobile ConvNeXt blocks (ResMoB), mobile ConvNeXt blocks (MoB), pooling-based attention blocks, and output blocks. To facilitate a clearer understanding of AttentionPoolMobileNeXt, we have provided a schematic overview of the model.

Figure 1 illustrates our utilization of both average pooling and maximum pooling to construct an attention mechanism resembling that of PoolFormer. This strategic implementation has allowed us to introduce a more lightweight model compared to MobileNetV2. Furthermore, the transition details of this AttentionPoolMobileNeXt are presented in Table 3.

Fig. 1
figure 1

Schematic representation of the presented AttentionPoolMobileNeXt. Herein, Conv.: convolution, BN: batch normalization, Avg. Pool: average pooling, Max Pool: maximum pooling, GAP: global average pooling

Table 3 Transition list of the presented AttentionPoolMobileNeXt

Table 3 outlines the architecture and operations performed at various stages of the proposed AttentionPoolMobileNeXt.

4 The proposed deep feature engineering model

This research introduces a novel DFE model based on the presented AttentionPoolMobileNeXt, aimed at enhancing test classification. This pyramidal DFE model comprises three pivotal phases: (i) deep feature extraction utilizing MDWT and the pretrained AttentionPoolMobileNeXt, (ii) feature selection based on INCA, and (iii) classification using Support Vector Machine (SVM). To assess the effectiveness of our model, we apply it to test images and present the results. Figure 2 illustrates a block diagram of our developed deep feature engineering model based on AttentionPoolMobileNeXt, offering a clearer insight into its architectural design.

Fig. 2
figure 2

Block diagram of the developed deep feature engineering model based on the recommended AttentionPoolMobileNeXt. Herein, MDWT: multilevel discrete wavelet transform, B: wavelet band, f: individual features, INCA: iterative neigbourhood component analysis

The steps of the developed deep feature engineering model are presented as follows.

  • Step 1: Read each image from the collected test image dataset.

  • Step 2: Apply MDWT with four levels to each image. In this step, we have used low-pass filter bands (LL and LH) to extract features. Moreover, the ‘haar’ mother wavelet mother function has been used to get images.

    $$\left[L{L}_{1},L{H}_{1},H{L}_{1},H{H}_{1}\right]=\delta (Im)$$
    (1)
    $$\left[L{L}_{k},L{H}_{k},H{L}_{k},H{H}_{k}\right]=\delta \left(L{L}_{k-1}\right), k\in \{\mathrm{2,3},4\}$$
    (2)

where \(LL, LH, HL\) and \(HH\) are the low-low, low–high, high-low and high-high filter bands, \(\delta ()\) defines the DWT function and \(Im\) is the raw image. We have used four leveled MDWT to get \(LL\) and \(LH\) bands. MDWT generates floating point bands. Therefore, we have used normalization to get images.

  • Step 3: Create features from the raw image and low-pass filter bands. We have used the pretrained AttentionPoolMobileNeXt CNN. We have used the dropout layer of the presented AttentionPoolMobileNeXt. We have generated 256 features from the generated images.

$$f\left(j\right)=\alpha \left(Im\right),j\in \{\mathrm{1,2},\dots ,256\}$$
(3)
$$f\left(j+h\times 256\right)=\alpha \left(L{L}_{t}\right), t\in \left\{\mathrm{1,2},\mathrm{3,4}\right\},h\in \left\{\mathrm{2,4},\mathrm{6,8}\right\}$$
(4)
$$f\left(j+256\times \left(2h-1\right)\right)=\alpha \left(L{H}_{t}\right),$$
(5)

Herein, \(f\) defines the feature vector, \(\alpha (.)\) is the pretrained AttentionPoolMobileNeXt. In this phase, we have generated 9 feature vectors and we merged these feature vectors. In this step, 2304 (= 256 × 9) features have been extracted.

  • Step 4: Apply the iterative feature selector to the generated features with a length of 2304. INCA was proposed by Tuncer et al. [29] in 2020. In this work, an improved Neighborhood Component Analysis (NCA) [39] has been utilized to select the most relevant feature vector from the created set of features. Our approach, the iterative neighborhood component analysis (INCA), automatically determines the optimal number of features to be selected. Specifically, the qualified feature indexes are initially generated using the NCA feature selector. We then create a loop to iteratively select the relevant features and use a classifier to calculate the selected feature vectors and loss value array. The feature vector with the minimum misclassification value is chosen as the final feature vector. The mathematical formulation of our proposed feature selector is provided below for further clarity.

$$id1=NCA(f,y)$$
(6)
$$\begin{array}{c}{s}^{r-sv+1}\left(dim, i\right)=f\left(dim, id1\left(i\right)\right), i\in \left\{\mathrm{1,2},\dots ,r\right\},\\ dim\in \left\{\mathrm{1,2},\dots ,NI\right\} ,r\in \left\{sv,sv+1,\dots ,fv\right\}\end{array}$$
(7)
$$loss\left(r\right)=C\left({s}^{r},y\right)$$
(8)

Herein, \(id1\) represents the qualified indexes that are generated by NCA (\(NCA(.)\)) feature selector, \(y\) is the actual/real output, \(s\) implies the selected feature vector, \(sv\) defines start value of the loop and \(fv\) is the final value of the loop. \(NI\) is the number of images. We have calculated loss values (\(loss\)) of the selected feature vector by deploying a classifier (\(C(.)\)). By using the loss value array calculated, the most suitable feature vector has been selected below.

$$id2=min(loss)$$
(9)
$$selfeat={s}^{id2}$$
(10)

where \(id2\) is the index of the loss value with minimum loss value and \(selfeat\) is the final selected feature.

In this work, we have calculated the loss values employing SVM classifier. In this work, the most relevant 116 features have been selected.

  • Step 5: Fed to choose 116 features for the SVM classifier with 10-fold cross-validation. SVMs have long been recognized as one of the preeminent shallow classifiers. To this end, we sought to combine the power of SVMs with that of INCA in our research. We have tuned the parameters of the used SVM by deploying the Bayesian optimization [40]. The hyperparameters utilized for the SVM are outlined below: Kernel: Polynomial, Kernel Scale: 1, Standardize: True, Polynomial Order: 2, Box Constraint: 983.4589707080628, Coding: One-vs-All, Validation: 10-Fold Cross-Validation

We have dubbed this particular instantiation of the SVM the Quadratic SVM.

$$pr=SVM(selfeat,y)$$
(11)

Herein, \(pr\) defines the predicted vector by applying the SVM classifier.

5 Experimental results

Our work introduces two novel contributions to the field: The proposed CNN, and a deep feature engineering model. We partitioned the dataset into separate training and testing sets, and present the results of our experiments in this section.

5.1 Setup

To implement our proposed AttentionPoolMobileNeXt model, we utilized a personal computer (PC) equipped with an NVidia Ge-force 2070 graphical processing unit (GPU), 64 GB of memory, a 3.6 GHz processor, and the Windows 11 operating system. We employed the MATLAB programming environment, leveraging both the deep network designer and classification learner toolboxes to create our proposed models.

In our approach, we first trained the AttentionPoolMobileNeXt model on the training dataset and then computed a pretrained AttentionPoolMobileNeXt. The training options for the AttentionPoolMobileNeXt model were an initial learning rate of 0.005, a maximum of 20 epochs, and a mini-batch size of 32. We split the data into training and validation sets at an 80:20 ratio.

5.2 Results

In this section presents the classification results obtained using our proposed models. Our first step was to train the AttentionPoolMobileNeXt model, and provide the classification curve for this model during the training phase in Fig. 3.

Fig. 3
figure 3

Training and validation curves obtained for the proposed AttentionPoolMobileNeXt with damaged construction dataset

Figure 3 provides evidence of AttentionPoolMobileNeXt's remarkable training accuracy, which reached 100%. In addition, the final validation accuracy was 97.35%. To evaluate the model's performance further, we utilized the test dataset and generated a confusion matrix, as depicted in Fig. 4.

Fig. 4
figure 4

Confusion matrix obtained for the proposed method. * 1: Debris, 2: Damaged building, 3: Damaged highway, 4: Non-damaged building, 5: Non-damaged highway

As depicted in Fig. 4, the AttentionPoolMobileNeXt model exhibited a test classification accuracy of 90.17%. To enhance this performance, we introduced a DFE model that improved the classification capabilities of AttentionPoolMobileNeXt. In this DFE model, we extracted features from various sources, including the dropout layer of the pretrained AttentionPoolMobileNeXt, raw image data, and the LL and LH wavelet bands of the images.

Our deep feature extractor generated 256 features from each input, and with nine inputs, a total of 2304 features were extracted from each image. Employing INCA as a feature selector, we identified the most valuable 116 features out of the initial 2304. In the final phase, a SVM was employed for classification, resulting in our developed DFE model achieving an impressive 97% classification accuracy.

The confusion matrix for our AttentionPoolMobileNeXt-based model is presented in Fig. 5.

Fig. 5
figure 5

Confusion matrix obtained for the proposed DFE model based on AttentionPoolMobileNeXt

As shown in Fig. 5, our deep feature engineering model achieved a classification accuracy of 97%.

To comprehensively evaluate the classification performance of our model, we employed commonly used metrics such as accuracy, recall, precision, and F1-score. The details of these metrics are given below [41]:

  • Classification accuracy: It is the ratio of correctly predicted instances to the total instances in the dataset. A high accuracy indicates the overall correctness of the model's predictions. However, it may not be suitable for imbalanced datasets. Therefore, we need to use other metrics.

  • Recall: It measures the ability of a model to capture all the relevant instances and is termed class-wise classification accuracy. High recall implies fewer instances of the positive class being overlooked, which is crucial when false negatives are costly.

  • Precision: It assesses the accuracy of positive predictions made by the model. High precision indicates that a positive prediction by the model is likely to be accurate, minimizing false positives.

  • F1-score: It is the harmonic mean of precision and recall, providing a balanced metric. F1-score considers both false positives and false negatives, making it a suitable metric when there is an imbalance between classes.

The accuracy provides an overall view, recall, precision, and F1-score offer insights into specific aspects of a model's performance, particularly when dealing with imbalanced datasets or scenarios where certain types of errors are more critical than others.

The calculated test results are summarized in Table 4.

Table 4 Performance measures (%) obtained for the presented models

Table 4 reveals that the proposed model has achieved a 90.17% and 97% test classification accuracies. Notably, the non-damaged buildings class emerged as the best-performing class for the DFE model, exhibiting a remarkable 99.50% recall rate. In addition, AttentionPoolMobileNeXt demonstrated exceptional recall performance of 100% for the damaged building and non-damaged buildings class. However, both models exhibited poor performance in the debris class, which emerged as the worst-performing class

Precision metrics reveal that the AttentionPoolMobileNeXt-based DFE model excels in precision across various categories, showcasing notable improvements in Damaged highway, Non-damaged building, and Non-damaged highway classifications. These enhancements emphasize the model's increased accuracy in correctly predicting positive instances, minimizing false positives.

Analyzing the F1-Score, a metric that balances precision and recall, the AttentionPoolMobileNeXt-based DFE model consistently exhibits improvements across different classes and overall performance. Particularly noteworthy are the substantial improvements in Debris, Damaged building, and Non-damaged building classifications, emphasizing the model's improved balance between precision and recall.

The class-specific performance analysis further underscores the AttentionPoolMobileNeXt-based DFE model's proficiency in classifying Debris, Damaged buildings, and Non-damaged buildings, achieving high recall, precision, and F1-Score. Notable improvements are also observed in the Damaged highway classification, showcasing the DFE model's effectiveness across diverse construction damage categories.

The overall performance metrics, including accuracy, recall, precision, and F1-Score, collectively demonstrate the superiority of the AttentionPoolMobileNeXt-based DFE model over the baseline AttentionPoolMobileNeXt model. These findings underscore the efficacy and reliability of the deep feature engineering approach in advancing construction damage classification.

5.3 Explainable results

In our study, the application of AttentionPoolMobileNeXt, coupled with the Gradient-weighted Class Activation Mapping (Grad-CAM) method [42, 43], has provided valuable insights in the domain of construction damage detection. Figure 6 visually represents our model's capability in accurately identifying damaged areas, emphasizing the role of attention blocks within AttentionPoolMobileNeXt.

Fig. 6
figure 6

Heat map images obtained using the Grad-CAM technique for different images

It may be noted from Fig. 6 that, the embedded attention mechanisms effectively focus on key features which are indicative of construction damage. Grad-CAM highlights specific regions in the images where the AttentionPoolMobileNeXt's attention is concentrated, offering visual interpretation of the decision-making process. This transparency contributes to AttentionPoolMobileNeXt's interpretability, a crucial aspect to instill robustness in our findings.

The attention blocks play a crucial role in enabling the model to discern intricate patterns and subtleties associated with different forms of construction damages. This meticulous attention to relevant features enhances the overall classification accuracy of our model in identifying damaged areas.

Additionally, the Grad-CAM visualization acts as an interpretability tool and window to the internal workings of AttentionPoolMobileNeXt during classification. This helps to grasp the rationale behind specific decisions, providing valuable insights for further refinement of the model and practical application in real-world scenarios.

The combined use of AttentionPoolMobileNeXt and Grad-CAM has resulted in obtaining higher classification accuracies, and interpretability. The visual representations in Fig. 6 justifies the effectiveness of attention mechanisms in enhancing model performance and offer a transparent view into the decision-making processes, promoting trust and understanding of the model's predictions.

6 Discussions

We have introduced a novel attention-based CNN by adapting MobileNetV2, termed AttentionPoolMobileNeXt. This model represents a cutting-edge approach to deep feature engineering for image classification. The primary objective of AttentionPoolMobileNeXt is to explore the classification outcomes achieved through attention mechanisms in conjunction with a lightweight network. We deploy this network with a humanitarian focus, particularly in response to the profound impact of the 2023 seismic events in Turkiye. These earthquakes underscore the critical need for swift damage detection to assist disaster-stricken communities, considering the time-intensive nature of manual assessment. Therefore, an automated damage detection model is imperative, and leveraging the capabilities of deep learning stands out as one of the most effective approaches to address this urgent requirement.

We formulate the damage detection problem as a computer vision problem and gather an image dataset from open-source image datasets. Our presented AttentionPoolMobileNeXt and AttentionPoolMobileNeXt-based deep feature engineering models achieve classification accuracies of 90.17% and 97%, respectively. In the presented DFE model, we generate deep features from raw image low-pass filter images. INCA, an iterative feature selector, is then used to select the most relevant features, and SVM is employed to classify the selected features.

In the following (Fig. 7), we provide a detailed analysis of these features.

Fig. 7
figure 7

Number of features selected in various wavelet bands

Figure 7 reveals that out of the selected 116 features, 68 were generated from raw images, while the other 48 were generated using wavelet bands. Notably, LL (36 features were generated from LL bands) bands are more beneficial than LH (12 out of the selected 112 features were generated from LH bands) bands. Furthermore, Fig. 8 demonstrated that all inputs contributed to obtaining an accuracy of 97%.

Fig. 8
figure 8

Classification accuracy obtained for various classifiers

We utilized SVM for both selecting the most relevant feature vector in INCA and obtaining classification results. In selecting the appropriate SVM classifier, we conducted tests using decision tree (DT), linear discriminant (LD), k-nearest neighbors (k-NN), artificial neural network (ANN), bagged tree (BT), and SVM classifiers. The results of these tests are illustrated in Fig. 9.

Fig. 9
figure 9

The comparative results. *Mob: MobileNetV2, Effb0: EfficientNetb0, IncV3: InceptionV3, IncResNetV2: InceptionResNetV2 and AttPoolMob: AttentionPoolMobileNeXt

It can be noted from Fig. 8 that, the best classifier used is SVM, which attained a 97% classification accuracy. Additionally, LD attained a 96.17% classification accuracy for our selected features. In contrast, the worst-performing classifier is DT, which achieved an accuracy of 91.33%.

The comparative results are presented in Table 5.

Table 5 Comparison of our work with other state-of-the-art techniques

The information from Table 5 highlights that Liu et al. [47] achieved the closest result to our method, attaining a 98% accuracy. It is crucial to note, however, that Liu et al. [47] utilized a two-class dataset, while our study employed a more diverse five-class construction dataset. This distinction underscores the complexity of our dataset, and despite this increased challenge, we achieved a commendable 97% classification accuracy.

Furthermore, we employed our dataset to showcase the high classification performance of our model, employing well-established CNNs: MobileNetV2, ResNet50, DarkNet53, Xception, EfficientNetb0, DenseNet201, InceptionV3, and InceptionResNetV2. The test classification accuracies of these CNNs were compared with our proposed AttentionPoolMobileNeXt CNN, and the outcomes are presented in Fig. 9. To obtain accurate test classification accuracies and facilitate a reliable comparison, we utilized the test classification results of these CNNs by applying DFE approaches, similar to our model. This comprehensive evaluation provides a clear perspective on the performance of our model in comparison to widely recognized CNN architectures.

Figure 9 indicates that, our model achieved the highest test classification accuracy of 97% using our curated dataset. In comparison, MobileNetV2, our inspired CNN, attained a test classification accuracy of 92%. DenseNet201 performed the best, achieving a test classification accuracy of 95.83%. Our proposed AttentionPoolMobileNeXt is the lightest among them, with only approximately 1 million learnable parameters.

The findings, advantages, and limitations of our proposed method are given below.

  • Findings:

  • Presented AttentionPoolMobileNeXt demonstrates a proficient ability to accurately identify areas affected by construction damage (Fig. 6).

  • The attention blocks within AttentionPoolMobileNeXt focuses on salient features of construction damages and helps in recognizing intricate patterns associated with diverse forms of damage.

  • Grad-CAM provides transparency in the decision-making process by highlighting specific regions where the model's attention is concentrated. This visual interpretation enhances the model's elucidation, fostering confidence in the predictions.

  • Developed DFE model has increased the test accuracy from 90.17% to 97%.

  • Merits:

  • Collected diverse image dataset involving five classes for automatic construction damage detection.

  • Proposed the AttentionPoolMobileNeXt model by incorporating two pooling functions and two attention blocks into the MobileNetV2, to obtain high classification performance.

  • Developed AttentionPoolMobileNeXt model is a lighter model than MobileNetV2 as it used 1 million learnable parameters.

  • Presented AttentionPoolMobileNeXt reached higher classification performances than other commonly known CNNs (see Fig. 9).

  • Developed both CNN and DFE-based models have demonstrated superior classification performances.

  • Generated models outperformed the existing models, highlighting their potential usefulness in practical applications.

  • Limitations:

  • Although we tested our model on a significant dataset, additional evaluation on other datasets could further validate its performance. In this work, we focused on the serial earthquakes occurred in Turkiye. The model needs to be validated using the dataset obtained from other earthquake sites.

7 Conclusions

The impact of natural disasters on human lives is significant, and detecting the damage caused by these disasters is crucial. However, this process can be time-consuming, particularly for large-scale disasters. To address this issue, we propose a novel damage detection model to assist civil and construction engineers in identifying areas of damage more efficiently.

Our proposed model is based on attention-based CNN called AttentionPoolMobileNeXt. To evaluate the effectiveness of our approach, we acquired a new image dataset consisting of five classes. Our AttentionPoolMobileNeXt model achieved a 90.17% accuracy, while the AttentionPoolMobileNeXt-based DFE model reached an even higher 97% accuracy. These results demonstrate the effectiveness of our developed models for construction damage detection.

Our future work will focus on collecting more diverse and comprehensive construction damage datasets. We also aim to develop a more efficient attention CNN to achieve higher classification performance with fewer parameters than current lightweight CNNs.