AttentionPoolMobileNeXt: An automated construction damage detection model based on a new convolutional neural network and deep feature engineering models

Aydin, Mehmet; Barua, Prabal Datta; Chadalavada, Sreenivasulu; Dogan, Sengul; Tuncer, Turker; Chakraborty, Subrata; Acharya, Rajendra U.

doi:10.1007/s11042-024-19163-2

AttentionPoolMobileNeXt: An automated construction damage detection model based on a new convolutional neural network and deep feature engineering models

Open access
Published: 17 April 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Multimedia Tools and Applications Aims and scope Submit manuscript

AttentionPoolMobileNeXt: An automated construction damage detection model based on a new convolutional neural network and deep feature engineering models

Download PDF

Mehmet Aydin¹,
Prabal Datta Barua²,
Sreenivasulu Chadalavada³,
Sengul Dogan ORCID: orcid.org/0000-0001-9677-5684¹,
Turker Tuncer¹,
Subrata Chakraborty^4,5 &
…
Rajendra U. Acharya⁶

183 Accesses
Explore all metrics

Abstract

In 2023, Turkiye faced a series of devastating earthquakes and these earthquakes affected millions of people due to damaged constructions. These earthquakes demonstrated the urgent need for advanced automated damage detection models to help people. This study introduces a novel solution to address this challenge through the AttentionPoolMobileNeXt model, derived from a modified MobileNetV2 architecture. To rigorously evaluate the effectiveness of the model, we meticulously curated a dataset comprising instances of construction damage classified into five distinct classes. Upon applying this dataset to the AttentionPoolMobileNeXt model, we obtained an accuracy of 97%. In this work, we have created a dataset consisting of five distinct damage classes, and achieved 97% test accuracy using our proposed AttentionPoolMobileNeXt model. Additionally, the study extends its impact by introducing the AttentionPoolMobileNeXt-based Deep Feature Engineering (DFE) model, further enhancing the classification performance and interpretability of the system. The presented DFE significantly increased the test classification accuracy from 90.17% to 97%, yielding improvement over the baseline model. AttentionPoolMobileNeXt and its DFE counterpart collectively contribute to advancing the state-of-the-art in automated damage detection, offering valuable insights for disaster response and recovery efforts.

BiNet: Bridge Visual Inspection Dataset and Approach for Damage Detection

Multiclass Damage Detection for Autonomous Post-disaster Reconnaissance Using Quantum Convolutional Neural Network

Engineering-oriented bridge multiple-damage detection with damage integrity using modified faster region-based convolutional neural network

Article 09 March 2022

1 Introduction

Earthquakes pose perilous threats to human life and property, ranking among the most devastating natural disasters humanity faces [1, 2]. In densely populated urban regions, the risk of building damage or collapse during seismic events reaches an exceedingly grave magnitude [3]. Following an earthquake, the rapid and meticulous assessment of structural harm becomes paramount for effective emergency response, rescue operations, and subsequent reconstruction efforts [4]. In recent times, remote sensing technologies have emerged as invaluable instruments for discerning, detecting, and evaluating natural calamities, harnessing diverse data modalities including aerial or satellite images, Lidar, and SAR [5,6,7]. Thus, the precise classification of distinct forms of building impairment through the scrutiny of remote sensing imagery has become an exigent concern [8].

Various models for construction damage detection have been proposed in the literature. Some of these models are discussed below. Roy and Bhaduri [9] utilized DenseNet and Swin-Transformer for damage detection. The proposed DenseSPH-YOLOv5 model aimed to improve the accuracy and efficiency of damage detection in engineering informatics, using a large-scale road damage dataset (9053 road damage images) and achieving a precision of 89.51%. Seemab et al. [10] presented a method for detecting propagating cracks in reinforced concrete beams using digital image correlation measurements, providing effective crack detection. Zhu and Tang [11] proposed an automated hydraulic structures damage detection approach using drones, offering profound insights into maintenance and repairs by leveraging drone imagery and cutting-edge artificial intelligence techniques. A literature review regarding damage detection is given in Table 1.

Table 1 Literature review based on damage detection

Full size table

It may be noted from the above table that, the research gaps are as follows:

Explainable models are not much explored.
The CNNs employed for such works are either customized versions of VGG or 1D-CNN. Additionally, while YOLO is a prevalent tool used for damage detection, there is a shortage of original CNN models with attention mechanisms.
The two class datasets: (i) damaged and (ii) non-damaged have been used.

1.1 Motivation and our model

The year 2023 witnessed a series of catastrophic earthquakes in Turkiye, resulting in the deaths of over 50,000 people and rendering millions homeless. To alleviate the suffering of those affected by this tragedy, the foremost priority is to detect the extent of damage caused. However, this task can be a tedious and time-consuming process. To address this issue, we propose an automated damage detection model and have curated a novel dataset comprising five classes, namely (i) Debris, (ii) Damaged Buildings, (iii) Non-Damaged Buildings, (iv) Damaged Highways, and (v) Non-Damaged Highways.

To achieve high classification performance while retaining a lightweight deep learning model, we have modified the popular deep learning model, MobileNetV2 [26]. Since the introduction of vision transformers (ViT) [27], we have observed that attention blocks are efficient in achieving high classification performance. Therefore, we have added pooling-based attention blocks to MobileNetV2 to enhance its classification performance. Moreover, we have used the ConvNeXt strategy to modify MobileNetV2 blocks and we have obtained a more lightweight model. This modification has given rise to AttentionPoolMobileNeXt.

We have proposed a pyramidal deep feature engineering (DFE) model by utilizing the presented AttentionPoolMobileNeXt CNN. Firstly, we have trained the training dataset utilized by the proposed CNN. After that, we used the dropout layer of this pretrained AttentionPoolMobileNeXt as a deep feature extractor and this layer generated 256 features. We have utilized four leveled multilevel discrete wavelet transform (MDWT) [28] approximation to generate wavelet filters and we have extracted features from the raw image, low-low pass filter band (LL band), and low-high pass filter band (LH band). Subsequently, we have employed iterative neighborhood component analysis (INCA) [29] as a feature selector to identify the most relevant features, which have been classified using a support vector machine (SVM) [30, 31] classifier. Our proposed AttentionPoolMobileNeXt has achieved a testing accuracy of 90.17%, demonstrating the effectiveness of the cooperation between deep learning and feature engineering. Furthermore, our presented MDWT and AttentionPoolMobileNeXt deep learning models have attained a classification accuracy of 97%.

We have tried to address the literature gaps by introducing the following:

Introduced a novel deep CNN model.
Attained interpretable results from the proposed attention model to concentrate on regions of interest.
Compiled a new image dataset comprising five distinct classes.

1.2 Innovations and contributions

The innovations and contributions of our work are given below:

Novelties:

Collected a novel image dataset dedicated to construction damage detection, addressing the need for specialized data in this domain (automated damage detection).
Introduced an attention-based CNN architecture named AttentionPoolMobileNeXt, emphasizing its unique design for improved performance.
Presented a comprehensive deep feature engineering model by synergizing AttentionPoolMobileNeXt, the MDWT-based approximation inspired by watermarking methods, and machine learning algorithms such as INCA and SVM.

Contributions:

Proposed a novel automated damage detection model for accurate classification of five classes.
Developed a lightweight deep learning model designed for rapid and accurate classification of damages, aiming to address the need for efficiency in real-world scenarios.
Achieved an exceptional test classification accuracy of over 90% on the curated dataset, demonstrating the efficacy and reliability of the proposed approaches.

Our work contributed to the field of construction damage detection and classification. By introducing innovative combinations of attention-based CNN architecture, MDWT-based feature extraction, and machine learning algorithms, our work not only addresses the challenges associated with existing methods but also provides comprehensive technical details, emphasizing both methodological and empirical advancements.

2 Dataset

To evaluate our model's performance, we collected a novel image dataset consisting of five distinct classes [32,33,34,35,36]. The dataset's classes are as follows: (1) Debris, (2) Damaged building, (3) Damaged highway, (4) Non-damaged building, and (5) Non-damaged highway. Furthermore, we partitioned the dataset into training and testing subsets. The images were stored in either JPEG or JPG format and had varying dimensions. The dataset's attributes are presented in Table 2.

Table 2 Characteristics of the collected image dataset for construction damage classification

Full size table

As can be observed from Table 2, the construction damage image dataset is inherently imbalanced.

The collected dataset is publicly available on Kaggle, and researchers can download it using the following URL: https://www.kaggle.com/datasets/turkertuncer/damaged-constructions-image-dataset.

3 The proposed AttentionPoolMobileNeXt

Our essential objective is to present a novel lightweight deep learning model, termed AttentionPoolMobileNeXt. To achieve this, we have expanded upon the MobileNetV2 architecture by incorporating two attention blocks, drawing inspiration from PoolFormer [37]. Additionally, we have made enhancements to the MobileNetV2 blocks akin to ConvNeXt, utilizing combinations of convolution + normalization and convolution + activation. To provide a comprehensive understanding of our proposed model, we initially provide a succinct overview of the MobileNetV2 architecture. We have used the attention layer to focus region of interest and we have used a ConvNeXt-based strategy to extract a meaningful feature map with less number of learnable parameters than MobileNetV2.

MobileNetV2 leverages convolutions, linear bottlenecks, and inverted residuals [26], employing 1 × 1 and 3 × 3 convolutions for extracting image features. Feature transformation involves bottleneck inputs, 1 × 1 convolution, and depth-wise convolution with 3 × 3 filter sizes. While the architecture integrates addition-based shortcuts reminiscent of Residual Networks to address the vanishing gradient problem, it lacks attention blocks. To address this limitation and enhance classification performance, we have introduced two attention blocks to the MobileNetV2 architecture. Our selection of the MobileNetV2 architecture stems from its efficiency, with ongoing research efforts focusing on the development of MobileNetV3 [38].

In this study, we extended the MobileNetV2 architecture by incorporating attention blocks, resulting in the development of AttentionPoolMobileNeXt. This lightweight deep-learning model is designed to enhance classification performance. Our proposed architecture encompasses convolution, residual mobile ConvNeXt blocks (ResMoB), mobile ConvNeXt blocks (MoB), pooling-based attention blocks, and output blocks. To facilitate a clearer understanding of AttentionPoolMobileNeXt, we have provided a schematic overview of the model.

Figure 1 illustrates our utilization of both average pooling and maximum pooling to construct an attention mechanism resembling that of PoolFormer. This strategic implementation has allowed us to introduce a more lightweight model compared to MobileNetV2. Furthermore, the transition details of this AttentionPoolMobileNeXt are presented in Table 3.

Table 3 Transition list of the presented AttentionPoolMobileNeXt

Full size table

Table 3 outlines the architecture and operations performed at various stages of the proposed AttentionPoolMobileNeXt.

4 The proposed deep feature engineering model

This research introduces a novel DFE model based on the presented AttentionPoolMobileNeXt, aimed at enhancing test classification. This pyramidal DFE model comprises three pivotal phases: (i) deep feature extraction utilizing MDWT and the pretrained AttentionPoolMobileNeXt, (ii) feature selection based on INCA, and (iii) classification using Support Vector Machine (SVM). To assess the effectiveness of our model, we apply it to test images and present the results. Figure 2 illustrates a block diagram of our developed deep feature engineering model based on AttentionPoolMobileNeXt, offering a clearer insight into its architectural design.

The steps of the developed deep feature engineering model are presented as follows.

Step 1: Read each image from the collected test image dataset.
Step 2: Apply MDWT with four levels to each image. In this step, we have used low-pass filter bands (LL and LH) to extract features. Moreover, the ‘haar’ mother wavelet mother function has been used to get images.
$$\left[L{L}_{1},L{H}_{1},H{L}_{1},H{H}_{1}\right]=\delta (Im)$$
(1)
$$\left[L{L}_{k},L{H}_{k},H{L}_{k},H{H}_{k}\right]=\delta \left(L{L}_{k-1}\right), k\in \{\mathrm{2,3},4\}$$
(2)

where $LL, LH, HL$ and $HH$ are the low-low, low–high, high-low and high-high filter bands, $\delta ()$ defines the DWT function and $Im$ is the raw image. We have used four leveled MDWT to get $LL$ and $LH$ bands. MDWT generates floating point bands. Therefore, we have used normalization to get images.

Step 3: Create features from the raw image and low-pass filter bands. We have used the pretrained AttentionPoolMobileNeXt CNN. We have used the dropout layer of the presented AttentionPoolMobileNeXt. We have generated 256 features from the generated images.

$$f\left(j\right)=\alpha \left(Im\right),j\in \{\mathrm{1,2},\dots ,256\}$$

(3)

$$f\left(j+h\times 256\right)=\alpha \left(L{L}_{t}\right), t\in \left\{\mathrm{1,2},\mathrm{3,4}\right\},h\in \left\{\mathrm{2,4},\mathrm{6,8}\right\}$$

(4)

$$f\left(j+256\times \left(2h-1\right)\right)=\alpha \left(L{H}_{t}\right),$$

(5)

Herein, $f$ defines the feature vector, $\alpha (.)$ is the pretrained AttentionPoolMobileNeXt. In this phase, we have generated 9 feature vectors and we merged these feature vectors. In this step, 2304 (= 256 × 9) features have been extracted.

Step 4: Apply the iterative feature selector to the generated features with a length of 2304. INCA was proposed by Tuncer et al. [29] in 2020. In this work, an improved Neighborhood Component Analysis (NCA) [39] has been utilized to select the most relevant feature vector from the created set of features. Our approach, the iterative neighborhood component analysis (INCA), automatically determines the optimal number of features to be selected. Specifically, the qualified feature indexes are initially generated using the NCA feature selector. We then create a loop to iteratively select the relevant features and use a classifier to calculate the selected feature vectors and loss value array. The feature vector with the minimum misclassification value is chosen as the final feature vector. The mathematical formulation of our proposed feature selector is provided below for further clarity.

$$id1=NCA(f,y)$$

(6)

$$\begin{array}{c}{s}^{r-sv+1}\left(dim, i\right)=f\left(dim, id1\left(i\right)\right), i\in \left\{\mathrm{1,2},\dots ,r\right\},\\ dim\in \left\{\mathrm{1,2},\dots ,NI\right\} ,r\in \left\{sv,sv+1,\dots ,fv\right\}\end{array}$$

(7)

$$loss\left(r\right)=C\left({s}^{r},y\right)$$

(8)

Herein, $id1$ represents the qualified indexes that are generated by NCA ($NCA(.)$) feature selector, $y$ is the actual/real output, $s$ implies the selected feature vector, $sv$ defines start value of the loop and $fv$ is the final value of the loop. $NI$ is the number of images. We have calculated loss values ($loss$) of the selected feature vector by deploying a classifier ($C(.)$). By using the loss value array calculated, the most suitable feature vector has been selected below.

$$id2=min(loss)$$

(9)

$$selfeat={s}^{id2}$$

(10)

where $id2$ is the index of the loss value with minimum loss value and $selfeat$ is the final selected feature.

In this work, we have calculated the loss values employing SVM classifier. In this work, the most relevant 116 features have been selected.

Step 5: Fed to choose 116 features for the SVM classifier with 10-fold cross-validation. SVMs have long been recognized as one of the preeminent shallow classifiers. To this end, we sought to combine the power of SVMs with that of INCA in our research. We have tuned the parameters of the used SVM by deploying the Bayesian optimization [40]. The hyperparameters utilized for the SVM are outlined below: Kernel: Polynomial, Kernel Scale: 1, Standardize: True, Polynomial Order: 2, Box Constraint: 983.4589707080628, Coding: One-vs-All, Validation: 10-Fold Cross-Validation

We have dubbed this particular instantiation of the SVM the Quadratic SVM.

$$pr=SVM(selfeat,y)$$

(11)

Herein, $pr$ defines the predicted vector by applying the SVM classifier.

5 Experimental results

Our work introduces two novel contributions to the field: The proposed CNN, and a deep feature engineering model. We partitioned the dataset into separate training and testing sets, and present the results of our experiments in this section.

5.1 Setup

To implement our proposed AttentionPoolMobileNeXt model, we utilized a personal computer (PC) equipped with an NVidia Ge-force 2070 graphical processing unit (GPU), 64 GB of memory, a 3.6 GHz processor, and the Windows 11 operating system. We employed the MATLAB programming environment, leveraging both the deep network designer and classification learner toolboxes to create our proposed models.

In our approach, we first trained the AttentionPoolMobileNeXt model on the training dataset and then computed a pretrained AttentionPoolMobileNeXt. The training options for the AttentionPoolMobileNeXt model were an initial learning rate of 0.005, a maximum of 20 epochs, and a mini-batch size of 32. We split the data into training and validation sets at an 80:20 ratio.

5.2 Results

In this section presents the classification results obtained using our proposed models. Our first step was to train the AttentionPoolMobileNeXt model, and provide the classification curve for this model during the training phase in Fig. 3.

Figure 3 provides evidence of AttentionPoolMobileNeXt's remarkable training accuracy, which reached 100%. In addition, the final validation accuracy was 97.35%. To evaluate the model's performance further, we utilized the test dataset and generated a confusion matrix, as depicted in Fig. 4.

As depicted in Fig. 4, the AttentionPoolMobileNeXt model exhibited a test classification accuracy of 90.17%. To enhance this performance, we introduced a DFE model that improved the classification capabilities of AttentionPoolMobileNeXt. In this DFE model, we extracted features from various sources, including the dropout layer of the pretrained AttentionPoolMobileNeXt, raw image data, and the LL and LH wavelet bands of the images.

Our deep feature extractor generated 256 features from each input, and with nine inputs, a total of 2304 features were extracted from each image. Employing INCA as a feature selector, we identified the most valuable 116 features out of the initial 2304. In the final phase, a SVM was employed for classification, resulting in our developed DFE model achieving an impressive 97% classification accuracy.

The confusion matrix for our AttentionPoolMobileNeXt-based model is presented in Fig. 5.

As shown in Fig. 5, our deep feature engineering model achieved a classification accuracy of 97%.

To comprehensively evaluate the classification performance of our model, we employed commonly used metrics such as accuracy, recall, precision, and F1-score. The details of these metrics are given below [41]:

Classification accuracy: It is the ratio of correctly predicted instances to the total instances in the dataset. A high accuracy indicates the overall correctness of the model's predictions. However, it may not be suitable for imbalanced datasets. Therefore, we need to use other metrics.
Recall: It measures the ability of a model to capture all the relevant instances and is termed class-wise classification accuracy. High recall implies fewer instances of the positive class being overlooked, which is crucial when false negatives are costly.
Precision: It assesses the accuracy of positive predictions made by the model. High precision indicates that a positive prediction by the model is likely to be accurate, minimizing false positives.
F1-score: It is the harmonic mean of precision and recall, providing a balanced metric. F1-score considers both false positives and false negatives, making it a suitable metric when there is an imbalance between classes.

The accuracy provides an overall view, recall, precision, and F1-score offer insights into specific aspects of a model's performance, particularly when dealing with imbalanced datasets or scenarios where certain types of errors are more critical than others.

The calculated test results are summarized in Table 4.

Table 4 Performance measures (%) obtained for the presented models

Full size table

Table 4 reveals that the proposed model has achieved a 90.17% and 97% test classification accuracies. Notably, the non-damaged buildings class emerged as the best-performing class for the DFE model, exhibiting a remarkable 99.50% recall rate. In addition, AttentionPoolMobileNeXt demonstrated exceptional recall performance of 100% for the damaged building and non-damaged buildings class. However, both models exhibited poor performance in the debris class, which emerged as the worst-performing class

Precision metrics reveal that the AttentionPoolMobileNeXt-based DFE model excels in precision across various categories, showcasing notable improvements in Damaged highway, Non-damaged building, and Non-damaged highway classifications. These enhancements emphasize the model's increased accuracy in correctly predicting positive instances, minimizing false positives.

Analyzing the F1-Score, a metric that balances precision and recall, the AttentionPoolMobileNeXt-based DFE model consistently exhibits improvements across different classes and overall performance. Particularly noteworthy are the substantial improvements in Debris, Damaged building, and Non-damaged building classifications, emphasizing the model's improved balance between precision and recall.

The class-specific performance analysis further underscores the AttentionPoolMobileNeXt-based DFE model's proficiency in classifying Debris, Damaged buildings, and Non-damaged buildings, achieving high recall, precision, and F1-Score. Notable improvements are also observed in the Damaged highway classification, showcasing the DFE model's effectiveness across diverse construction damage categories.

The overall performance metrics, including accuracy, recall, precision, and F1-Score, collectively demonstrate the superiority of the AttentionPoolMobileNeXt-based DFE model over the baseline AttentionPoolMobileNeXt model. These findings underscore the efficacy and reliability of the deep feature engineering approach in advancing construction damage classification.

5.3 Explainable results

In our study, the application of AttentionPoolMobileNeXt, coupled with the Gradient-weighted Class Activation Mapping (Grad-CAM) method [42, 43], has provided valuable insights in the domain of construction damage detection. Figure 6 visually represents our model's capability in accurately identifying damaged areas, emphasizing the role of attention blocks within AttentionPoolMobileNeXt.

It may be noted from Fig. 6 that, the embedded attention mechanisms effectively focus on key features which are indicative of construction damage. Grad-CAM highlights specific regions in the images where the AttentionPoolMobileNeXt's attention is concentrated, offering visual interpretation of the decision-making process. This transparency contributes to AttentionPoolMobileNeXt's interpretability, a crucial aspect to instill robustness in our findings.

The attention blocks play a crucial role in enabling the model to discern intricate patterns and subtleties associated with different forms of construction damages. This meticulous attention to relevant features enhances the overall classification accuracy of our model in identifying damaged areas.

Additionally, the Grad-CAM visualization acts as an interpretability tool and window to the internal workings of AttentionPoolMobileNeXt during classification. This helps to grasp the rationale behind specific decisions, providing valuable insights for further refinement of the model and practical application in real-world scenarios.

The combined use of AttentionPoolMobileNeXt and Grad-CAM has resulted in obtaining higher classification accuracies, and interpretability. The visual representations in Fig. 6 justifies the effectiveness of attention mechanisms in enhancing model performance and offer a transparent view into the decision-making processes, promoting trust and understanding of the model's predictions.

6 Discussions

We have introduced a novel attention-based CNN by adapting MobileNetV2, termed AttentionPoolMobileNeXt. This model represents a cutting-edge approach to deep feature engineering for image classification. The primary objective of AttentionPoolMobileNeXt is to explore the classification outcomes achieved through attention mechanisms in conjunction with a lightweight network. We deploy this network with a humanitarian focus, particularly in response to the profound impact of the 2023 seismic events in Turkiye. These earthquakes underscore the critical need for swift damage detection to assist disaster-stricken communities, considering the time-intensive nature of manual assessment. Therefore, an automated damage detection model is imperative, and leveraging the capabilities of deep learning stands out as one of the most effective approaches to address this urgent requirement.

We formulate the damage detection problem as a computer vision problem and gather an image dataset from open-source image datasets. Our presented AttentionPoolMobileNeXt and AttentionPoolMobileNeXt-based deep feature engineering models achieve classification accuracies of 90.17% and 97%, respectively. In the presented DFE model, we generate deep features from raw image low-pass filter images. INCA, an iterative feature selector, is then used to select the most relevant features, and SVM is employed to classify the selected features.

In the following (Fig. 7), we provide a detailed analysis of these features.

Figure 7 reveals that out of the selected 116 features, 68 were generated from raw images, while the other 48 were generated using wavelet bands. Notably, LL (36 features were generated from LL bands) bands are more beneficial than LH (12 out of the selected 112 features were generated from LH bands) bands. Furthermore, Fig. 8 demonstrated that all inputs contributed to obtaining an accuracy of 97%.

We utilized SVM for both selecting the most relevant feature vector in INCA and obtaining classification results. In selecting the appropriate SVM classifier, we conducted tests using decision tree (DT), linear discriminant (LD), k-nearest neighbors (k-NN), artificial neural network (ANN), bagged tree (BT), and SVM classifiers. The results of these tests are illustrated in Fig. 9.

It can be noted from Fig. 8 that, the best classifier used is SVM, which attained a 97% classification accuracy. Additionally, LD attained a 96.17% classification accuracy for our selected features. In contrast, the worst-performing classifier is DT, which achieved an accuracy of 91.33%.

The comparative results are presented in Table 5.

Table 5 Comparison of our work with other state-of-the-art techniques

Full size table

The information from Table 5 highlights that Liu et al. [47] achieved the closest result to our method, attaining a 98% accuracy. It is crucial to note, however, that Liu et al. [47] utilized a two-class dataset, while our study employed a more diverse five-class construction dataset. This distinction underscores the complexity of our dataset, and despite this increased challenge, we achieved a commendable 97% classification accuracy.

Furthermore, we employed our dataset to showcase the high classification performance of our model, employing well-established CNNs: MobileNetV2, ResNet50, DarkNet53, Xception, EfficientNetb0, DenseNet201, InceptionV3, and InceptionResNetV2. The test classification accuracies of these CNNs were compared with our proposed AttentionPoolMobileNeXt CNN, and the outcomes are presented in Fig. 9. To obtain accurate test classification accuracies and facilitate a reliable comparison, we utilized the test classification results of these CNNs by applying DFE approaches, similar to our model. This comprehensive evaluation provides a clear perspective on the performance of our model in comparison to widely recognized CNN architectures.

Figure 9 indicates that, our model achieved the highest test classification accuracy of 97% using our curated dataset. In comparison, MobileNetV2, our inspired CNN, attained a test classification accuracy of 92%. DenseNet201 performed the best, achieving a test classification accuracy of 95.83%. Our proposed AttentionPoolMobileNeXt is the lightest among them, with only approximately 1 million learnable parameters.

The findings, advantages, and limitations of our proposed method are given below.

Findings:

Presented AttentionPoolMobileNeXt demonstrates a proficient ability to accurately identify areas affected by construction damage (Fig. 6).
The attention blocks within AttentionPoolMobileNeXt focuses on salient features of construction damages and helps in recognizing intricate patterns associated with diverse forms of damage.
Grad-CAM provides transparency in the decision-making process by highlighting specific regions where the model's attention is concentrated. This visual interpretation enhances the model's elucidation, fostering confidence in the predictions.
Developed DFE model has increased the test accuracy from 90.17% to 97%.

Merits:

Collected diverse image dataset involving five classes for automatic construction damage detection.
Proposed the AttentionPoolMobileNeXt model by incorporating two pooling functions and two attention blocks into the MobileNetV2, to obtain high classification performance.
Developed AttentionPoolMobileNeXt model is a lighter model than MobileNetV2 as it used 1 million learnable parameters.
Presented AttentionPoolMobileNeXt reached higher classification performances than other commonly known CNNs (see Fig. 9).
Developed both CNN and DFE-based models have demonstrated superior classification performances.
Generated models outperformed the existing models, highlighting their potential usefulness in practical applications.

Limitations:

Although we tested our model on a significant dataset, additional evaluation on other datasets could further validate its performance. In this work, we focused on the serial earthquakes occurred in Turkiye. The model needs to be validated using the dataset obtained from other earthquake sites.

7 Conclusions

The impact of natural disasters on human lives is significant, and detecting the damage caused by these disasters is crucial. However, this process can be time-consuming, particularly for large-scale disasters. To address this issue, we propose a novel damage detection model to assist civil and construction engineers in identifying areas of damage more efficiently.

Our proposed model is based on attention-based CNN called AttentionPoolMobileNeXt. To evaluate the effectiveness of our approach, we acquired a new image dataset consisting of five classes. Our AttentionPoolMobileNeXt model achieved a 90.17% accuracy, while the AttentionPoolMobileNeXt-based DFE model reached an even higher 97% accuracy. These results demonstrate the effectiveness of our developed models for construction damage detection.

Our future work will focus on collecting more diverse and comprehensive construction damage datasets. We also aim to develop a more efficient attention CNN to achieve higher classification performance with fewer parameters than current lightweight CNNs.

Data availability

The authors are committed to making the data available if requested by the journal.

References

Chukwuka OJ, Ren J, Wang J, Paraskevadakis D (2023) A comprehensive research on analyzing risk factors in emergency supply chains. J Humanit Logist Supply Chain Manag 13(3):249–292. https://doi.org/10.1108/jhlscm-10-2022-0108
Kitch SL (2023) Reproductive rights and ecofeminism. Humanities 12(2):34
Article Google Scholar
Kuo SS, Marshall JT, Rowberry R (2022) The Cambridge handbook of disaster law and policy: Risk, recovery, and redevelopment. Cambridge University Press
Book Google Scholar
McEntire D, Souza J, Collins ML, Peters EJ, Sadiq A-A (2012) An introspective glance into damage assessment: challenges and lessons learned from the Paso Robles (San Simeon) earthquake. Nat Hazards 61:1389–1409
Article Google Scholar
Council NR (2014) Opportunities to use remote sensing in understanding permafrost and related ecological characteristics: report of a workshop. The National Academies Press. https://dels.nationalacademies.org/Report/Opportunities-Remote-Sensing/18711
Ang L-M, Seng KP, Zungeru AM, Ijemaru GK (2017) Big sensor data systems for smart cities. IEEE Internet Things J 4(5):1259–1271
Article Google Scholar
Reding DF, Eaton J (2020) Science and technology trends 2020–2040: exploring the S and T edge. NATO S and T Organization, pp 94–103
Dong L, Shan J (2013) A comprehensive review of earthquake-induced building damage detection with remote sensing techniques. ISPRS J Photogramm Remote Sens 84:85–99
Article Google Scholar
Roy AM, Bhaduri J (2023) DenseSPH-YOLOv5: An automated damage detection model based on DenseNet and Swin-Transformer prediction head-enabled YOLOv5 with attention mechanism. Adv Eng Inform 56:102007
Article Google Scholar
Seemab F, Schmidt M, Baktheer A, Classen M, Chudoba R (2023) Automated detection of propagating cracks in RC beams without shear reinforcement based on DIC-controlled modeling of damage localization. Eng Struct 286:116118
Article Google Scholar
Zhu Y, Tang H (2023) Automatic damage detection and diagnosis for hydraulic structures using drones and artificial intelligence techniques. Remote Sens 15(3):615
Article Google Scholar
Marín-García D, Bienvenido-Huertas D, Carretero-Ayuso MJ, Della Torre S (2023) Deep learning model for automated detection of efflorescence and its possible treatment in images of brick facades. Autom Constr 145:104658
Article Google Scholar
Hacıefendioğlu K, Ayas S, Başağa HB, Toğan V, Mostofi F, Can A (2022) Wood construction damage detection and localization using deep convolutional neural network with transfer learning. Eur J Wood Wood Prod 80(4):791–804
Article Google Scholar
Hake F, Göttert L, Neumann I, Alkhatib H (2022) Using machine-learning for the damage detection of harbour structures. Remote Sens 14(11):2518
Article Google Scholar
Chen L, Chen W, Wang L, Zhai C, Hu X, Sun L, Tian Y, Huang X, Jiang L (2023) Convolutional neural networks (CNNs)-based multi-category damage detection and recognition of high-speed rail (HSR) reinforced concrete (RC) bridges using test images. Eng Struct 276:115306
Article Google Scholar
Teng S, Chen X, Chen G, Cheng L (2023) Structural damage detection based on transfer learning strategy using digital twins of bridges. Mech Syst Signal Process 191:110160
Article Google Scholar
Zhang X, Wogen BE, Liu X, Iturburu L, Salmeron M, Dyke SJ, Poston R, Ramirez JA (2023) Machine-aided bridge deck crack condition state assessment using artificial intelligence. Sensors 23(9):4192
Article Google Scholar
Hake F, Scherff M, Neumann I, Alkhatib H (2023) Using semantic segmentation for the damage detection of port and marine infrastructures. Berlin, Offenbach: Wichmann, pp 93–106
Chennareddy LN, Gandabathula SV, Jasthi VV, Shaik F (2023) Railway Bridge Inspection using CNN. In: 2023 7th International Conference on Computing Methodologies and Communication (ICCMC). IEEE, pp 416–421
Mostofi F, Toğan V, Ayözen YE, BehzatTokdemir O (2022) Predicting the impact of construction rework cost using an ensemble classifier. Sustainability 14(22):14800
Article Google Scholar
Kamoona AM, Gostar AK, Wang X, Easton M, Bab-Hadiashar A, Hoseinnezhad R (2024) Anomaly detection of defect using energy of point pattern features within random finite set framework. Eng Appl Artif Intell 130:107706
Article Google Scholar
Kumar P, Batchu S, Kota SR (2021) Real-time concrete damage detection using deep learning for high rise structures. IEEE Access 9:112312–112331
Article Google Scholar
Pan X, Yang T (2020) Postdisaster image-based damage detection and repair cost estimation of reinforced concrete buildings using dual convolutional neural networks. Comput Aided Civ Infrastruct Eng 35(5):495–510
Article Google Scholar
Zhang J, Yang X, Li W, Zhang S, Jia Y (2020) Automatic detection of moisture damages in asphalt pavements from GPR data with deep CNN and IRS method. Autom Constr 113:103119
Article Google Scholar
Teng S, Chen G, Gong P, Liu G, Cui F (2020) Structural damage detection using convolutional neural networks combining strain energy and dynamic response. Meccanica 55:945–959
Article MathSciNet Google Scholar
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:201011929
Fang S-H, Chang W-H, Tsao Y, Shih H-C, Wang C (2016) Channel state reconstruction using multilevel discrete wavelet transform for improved fingerprinting-based indoor localization. IEEE Sens J 16(21):7784–7791
Article Google Scholar
Tuncer T, Dogan S, Özyurt F, Belhaouari SB, Bensmail H (2020) Novel multi center and threshold ternary pattern based method for disease detection method using voice. IEEE Access 8:84532–84540
Article Google Scholar
Vapnik V (1998) The support vector method of function estimation. In: Nonlinear Modeling. Springer, pp 55–85
Vapnik V (2013) The nature of statistical learning theory. Springer Science & Business Media, Berlin, Germany
Karim F, Rajbangshi K (2023) Clean/littered road classification. https://www.kaggle.com/datasets/faizalkarim/cleandirty-road-classification. Accessed 05.10.2023
Viren (2023) Pothole and plain road images. https://www.kaggle.com/datasets/virenbr11/pothole-and-plain-rode-images. Accessed 12.09.2023
Haberler.com (2023) Frightening images in Hatay: the 3-kilometer road first split and then collapsed. https://www.dailymotion.com/video/x8ifici. Accessed 12.10.2023
Cuneyt O (2023) Here is the last state of Hatay. The last scene before the destruction. https://www.youtube.com/watch?v=TnlvaWRMpLs. Accessed 05.10.2023
Roy J (2006) E-government in Canada: transformation for the digital age. Ottowa, ON, Canada: University of Ottawa Press
Yu W, Luo M, Zhou P, Si C, Zhou Y, Wang X, Feng J, Yan S (2022) Metaformer is actually what you need for vision. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10819–10829
Koonce B, Koonce B (2021) MobileNetV3. Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization, pp 125–144
Goldberger J, Hinton GE, Roweis S, Salakhutdinov RR (2004) Neighbourhood components analysis. Adv Neural Inf Process Syst 17:513–520
Google Scholar
Frazier PI (2018) A tutorial on Bayesian optimization. arXiv preprint arXiv:180702811
Warrens MJ (2008) On the equivalence of Cohen’s kappa and the Hubert-Arabie adjusted Rand index. J Classif 25(2):177–183
Article MathSciNet Google Scholar
Loh HW, Ooi CP, Seoni S, Barua PD, Molinari F, Acharya UR (2022) Application of explainable artificial intelligence for healthcare: a systematic review of the last decade (2011–2022). Comput Methods Prog Biomed 226:107161
Jahmunah V, Ng EYK, Tan R-S, Oh SL, Acharya UR (2022) Explainable detection of myocardial infarction using deep learning models with Grad-CAM technique on ECG signals. Comput Biol Med 146:105550
Article Google Scholar
Naito S, Tomozawa H, Mori Y, Nagata T, Monma N, Nakamura H, Fujiwara H, Shoji G (2020) Building-damage detection method based on machine learning utilizing aerial photographs of the Kumamoto earthquake. Earthq Spectra 36(3):1166–1187
Article Google Scholar
Stramondo S, Bignami C, Chini M, Pierdicca N, Tertulliani A (2006) Satellite radar and optical remote sensing for earthquake damage detection: results from different case studies. Int J Remote Sens 27(20):4433–4447
Article Google Scholar
Abdi G, Esfandiari M, Jabari S (2021) Building damage detection in post-event high-resolution imagery using deep transfer learning. In: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS. IEEE, pp 531–534
Liu C, Sui H, Wang J, Ni Z, Ge L (2022) Real-time ground-level building damage detection based on lightweight and accurate YOLOv5 using terrestrial images. Remote Sens 14(12):2763
Article Google Scholar
Seydi ST, Rastiveis H, Kalantar B, Halin AA, Ueda N (2022) BDD-Net: An end-to-end multiscale residual CNN for earthquake-induced building damage detection. Remote Sens 14(9):2214
Article Google Scholar
Lin Q, Ci T, Wang L, Mondal SK, Yin H, Wang Y (2022) Transfer learning for improving seismic building damage assessment. Remote Sens 14(1):201
Article Google Scholar
Zhan Y, Liu W, Maruyama Y (2022) Damaged building extraction using modified Mask R-CNN model using post-event aerial images of the 2016 Kumamoto earthquake. Remote Sens 14(4):1002
Article Google Scholar

Download references

Funding

Open access funding provided by the Scientific and Technological Research Council of Türkiye (TÜBİTAK).

Author information

Authors and Affiliations

Department of Digital Forensics Engineering, Technology Faculty, Firat University, Elazig, Türkiye
Mehmet Aydin, Sengul Dogan & Turker Tuncer
School of Business (Information System), University of Southern Queensland, Toowoomba, Australia
Prabal Datta Barua
School of Engineering, University of Southern Queensland, Springfield Lakes, QLD, 4300, Australia
Sreenivasulu Chadalavada
School of Science and Technology, Faculty of Science, Agriculture, Business and Law, University of New England, Armidale, NSW, 2351, Australia
Subrata Chakraborty
Center for Advanced Modelling and Geospatial Information Systems, Faculty of Engineering and IT, University of Technology Sydney, Sydney, NSW, 2007, Australia
Subrata Chakraborty
School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, Australia
Rajendra U. Acharya

Authors

Mehmet Aydin
View author publications
You can also search for this author in PubMed Google Scholar
Prabal Datta Barua
View author publications
You can also search for this author in PubMed Google Scholar
Sreenivasulu Chadalavada
View author publications
You can also search for this author in PubMed Google Scholar
Sengul Dogan
View author publications
You can also search for this author in PubMed Google Scholar
Turker Tuncer
View author publications
You can also search for this author in PubMed Google Scholar
Subrata Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar
Rajendra U. Acharya
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally to the study.

Corresponding author

Correspondence to Sengul Dogan.

Ethics declarations

Ethical approval

Ethical approval was not required for this research.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Aydin, M., Barua, P.D., Chadalavada, S. et al. AttentionPoolMobileNeXt: An automated construction damage detection model based on a new convolutional neural network and deep feature engineering models. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19163-2

Download citation

Received: 01 September 2023
Revised: 18 January 2024
Accepted: 02 April 2024
Published: 17 April 2024
DOI: https://doi.org/10.1007/s11042-024-19163-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

AttentionPoolMobileNeXt: An automated construction damage detection model based on a new convolutional neural network and deep feature engineering models

Abstract

Similar content being viewed by others

BiNet: Bridge Visual Inspection Dataset and Approach for Damage Detection

Multiclass Damage Detection for Autonomous Post-disaster Reconnaissance Using Quantum Convolutional Neural Network

Engineering-oriented bridge multiple-damage detection with damage integrity using modified faster region-based convolutional neural network