1 Introduction

Permafrost degradation caused by climate warming and thermal changes in soil often leads to a thickening of the active layer and the development of thermokarst landforms (Zhao et al. 2020), which has become a major concern on the Qinghai–Tibet Plateau (QTP) in China. With 69.2% of the global permafrost area existing in Asia, the QTP currently has the largest permafrost area at the highest elevation on this continent (Zhao et al. 2020). Thermokarst landforms occur when ground ice melts as a result of increases in air and ground temperatures, resulting in karst formations that cause irreversible subsidence and changes to the landscape. In 2002, Niu et al. reported the study on slope types and stability of typical slopes in the Fenghuo Mountain area of the QTP (Niu et al. 2002). As temperatures warm in the summer, ground ice may melt, causing the turf and soil to lose support, collapse, and slide downhill. The resulting landslide areas repeatedly melt and slide downward, causing each slump to expand (Koven et al. 2011). The degradation of permafrost leads to the decomposition of organic substances in frozen soil by microorganisms, producing greenhouse gases such as methane and carbon dioxide that affect the formation of underground ice and organic carbon and ultimately affect regional hydrology, ecology, carbon cycles, and global warming (Peng et al. 2018). Therefore, the phenomenon of thaw slumping has received extensive attention.

In recent years, researchers have expressed a growing interest in studying thaw slumping using field observations, satellite imagery, and image-based simulations (Aas et al. 2019). Understanding the distribution and evolution of thermokarst landforms is crucial for comprehending their impact on surface hydrological processes, geomorphological processes, carbon exchange, vegetation, and ecosystems at the local, regional, and global scales. Mapping the distribution of thaw slumping is essential for understanding carbon exchange and the degradation of permafrost at the local and regional levels. However, due to the diverse types and characteristics of thermokarst landforms (Huang et al. 2018), the mapping of thaw slumping from remote sensing imagery alone can be challenging, particularly in areas with high elevations, harsh environments, and complex geology that is difficult to interpret. On-site investigations of thermokarst landforms can be challenging as well, and large-scale, all-weather data collection and analysis are difficult to achieve. Field measurement and manual identification of thermokarst landscapes based on remote sensing imagery are time-consuming. Therefore, in this study, we proposed using deep learning to automatically map thaw slumping based on high-resolution remote sensing imagery.

Thermokarst landforms exhibit irregular shapes and sizes that are distinct from those of lakes or ponds; this makes it challenging to create digital maps automatically. Edge detection methods can be categorized into three types: traditional edge operator methods, learning-based methods, and deep learning. The objective of edge detection is to extract salient edges of objects from natural images while preserving the elements of the image and ignoring insignificant details. The extraction and combination of multi-scale features are essential for many vision-based tasks. Multi-scale representations can be developed by constructing an image pyramid from multiple rescaled images (Pinheiro and Collobert 2013). This can be done by computing features at each scale independently (Farabet et al. 2013), or by using the output of one scale as the input for the next scale (Eigen and Fergus 2014).

Network cascades can provide an effective scheme for various vision-based applications, such as classification (Murthy et al. 2016), detection (Li et al. 2015), and semantic segmentation (Li et al. 2017). The width of thaw slumping in the QTP ranges from a few decimeters to tens of meters, while the length ranges from a few to hundreds of meters. Therefore, data availability and the need to detect specific types of terrain limit automatic identification studies to small areas. In summary, non-lake and pond thermokarst landforms pose a challenge to automatic mapping because of their sizes and irregular shapes. Edge detection methods can extract salient edges of objects from natural images, while multi-scale features are crucial for various vision-based tasks. The use of network cascades can provide a useful scheme for many vision-based applications; deep learning has been used to detect objects, map land cover, and identify small-scale thaw slumping. However, few studies have targeted the mapping of large-scale thaw slumping.

By applying an edge detection method and using a supervised deep learning algorithm based on a bi-directional cascade network (BDCN), we were able to map thaw slumping terrain from high-resolution GaoFen-1 (GF-1) satellite remote sensing imagery with a panchromatic resolution of 1 m. There were two objectives in this study: (1) to evaluate the effectiveness of using a BDCN in identifying and mapping thaw slumping from high-resolution images using different backbone models; and (2) to analyze the mechanism behind the development of thaw slumping in the study area of the Beilu River Basin on the QTP.

This study provides insights into the use of advanced deep learning techniques for the detection of thaw slumping, particularly in areas with high elevations, harsh environments, and complex geology that is difficult to interpret. The proposed approach is expected to contribute to the development of more efficient and accurate methods for mapping thaw slumping and other thermokarst landforms, which is crucial for understanding carbon exchange and the degradation of permafrost at the local and regional scales.

2 Study Area and Data Preprocessing

This section provides the details of the study area, including aspects such as topography and climate. It also outlines the data sources and information used in this work, including remote sensing image data, DEM data, and so on.

2.1 Study Area

The study area (34.797–34.930° N, 92.796–92.998° E) is situated in the Beilu River region on the eastern part of the QTP, Qinghai Province, China with an area of 272.39 km2 and an average elevation of approximately 4500 m (Fig. 1). The majority of the area falls within the high-elevation alpine region of China, with a mean annual air temperature (MAAT) of – 4 °C. The coldest temperatures occur in January, averaging between − 2 and – 15 °C, while the warmest temperatures occur in July, with an average range of 8–18 °C. The air temperature decreases gradually with increasing elevation, displaying evident vertical zonality. The region is a representative example of a permafrost area. Geographically, the Beilu River is situated within the permafrost regions of the QTP hinterland, known for its ice-rich composition and high ground temperatures (Luo et al. 2015). With climate change leading to changes in vegetation cover and increased active layer thickness, the area has become highly sensitive to permafrost degradation, resulting in a high potential for thaw slumping.

Fig. 1
figure 1

Topographical and geological settings of the study area on the Qinghai–Tibet Plateau, China. The study area is traversed by the Qinghai–Tibet Railway from the Hoh Xil station to Fenghuoshan station in Qinghai Province. The topographical map was generated using 30-m digital elevation model (DEM) data from the Shuttle Radar Topography Mission (SRTM) v3 project and hillshade data derived from the DEM. The yellow star indicates the location of landslide (A) selected for visual interpretation in Sect. 4.2. a.s.l. = above sea level

2.2 Data Preprocessing

The present study used high-resolution GF-1 remote sensing images from 2020 as the original image data for deep learning analysis. Vegetation coverage analysis was conducted using high-resolution multispectral imaging from GF-1, while slope analysis of the target area was performed based on the Shuttle Radar Topography Mission (SRTM) V3 digital elevation model (DEM) data. The DEM data, jointly measured by the US National Aeronautics and Space Administration (NASA) and the US National Bureau of Image and Mapping of the Department of Defense, have a spatial resolution of 30 m and cover a latitude and longitude range of 33–37° N and 90–94° E, respectively.

Based on the data obtained from the Sentinel-1 satellite, this study used the Short Baseline Subset of the Interferometric Synthetic Aperture Radar (SBAS-InSAR) to detect settlements (Tizzani et al. 2007) caused by thaw slumping. The Sentinel-1 satellite, equipped with a C-band synthetic aperture radar and consisting of two satellites, has a revisit period of 12 days. The study selected 130 interferometric broadband mode and vertical-vertical polarization mode scenes from single-look complex image data spanning 5 years, from 2017 to 2021. Spatiotemporal baseline was set to improve the consistency of deformation calculations within the study area. To satisfy coherence, SAR images were divided into several subsets, and appropriate spatiotemporal baseline thresholds were calculated for each image and the main image. The mean time baseline between interference pairs was 12.46 days, while the mean spatial baseline was 85.22 m.

3 Method

This study proposed a roadmap for the delineation of thaw slump zones and uses deep learning algorithms to automatically identify areas with thaw slumping. The roadmap includes an overview of current methods used to identify and characterize thaw slumps, as well as a proposed framework for their division into zones based on their geomorphology and thermal characteristics. The potential for employing deep learning algorithms to identify thaw slumps using remote sensing data is discussed, which can greatly improve the detection and monitoring. The proposed roadmap can aid in better understanding and management of thaw slumps, and help mitigate their negative impacts. The study used multi-source data, including Sentinel-1 satellite data to obtain InSAR settlement, GF-2 satellite data to obtain normalized difference vegetation index (NDVI), and DEM elevation for slope, to identify areas susceptible to thaw slumping. The hidden danger areas were extracted using GF-1 high-resolution remote sensing imagery, while a manual interpretation of thaw slumping in the area was conducted to construct training samples for a deep learning model. The model was trained to automatically extract the boundaries of the thaw slumping. The detailed methodology is outlined in a flowchart in Fig. 2.

Fig. 2
figure 2

Flowchart of the methodology for the division of thaw slumping-prone regions and deep learning automatic identification of those regions. DEM = digital elevation model, GaoFen-1 and Gaofen-2 are Chinese satellites, InSAR = interferometric synthetic aperture radar, NDVI = Normalized Difference Vegetation Index

3.1 Multi-factor Data Partitioning of Suspected Thaw Slumping Areas

The SBAS technique was used with InSAR data to estimate surface deformation over time. This technique involves using multiple interferograms to derive phase differences, which are then processed to obtain the rate of deformation of the surface. By combining multiple interferograms with different perpendicular baselines at different spatial resolutions, SBAS-InSAR can effectively reduce atmospheric and temporal decorrelations, and allow researchers to obtain high-precision deformation measurements over large areas. The InSAR data processing software GIAnt, developed by the California Institute of Technology, was selected for data import and processing. The commonly used method for estimating the deformation time series is the singular value decomposition approach. The accuracy of deep learning identification and classification was comprehensively compared by combining spatiotemporal baseline analysis with time series surface subsidence analysis and data related to the active layer thickness and slope in the study area.

The D-InSAR time series analysis method known as SBAS-InSAR was first proposed by Berardino et al. (2002). Assuming that N + 1 SAR impact maps covering the same study area are used, they can be sorted as follows in chronological order:

$$\begin{array}{c}t ={\left[{t}_{0},{t}_{1},{t}_{2},\cdots ,{t}_{M-1}\right]}^{T}\end{array}$$
(1)

In this case, N interference pairs are generated from M pairs of multi-parallax differential interference pairs obtained through the maximum spatiotemporal baseline threshold (Wang et al. 2018), and M satisfies

$$\begin{array}{c}\frac{M}{2}\le n\le \frac{M\left(M-1\right)}{2}\end{array}$$
(2)

The generated multi-parallax differential interferogram was filtered and phase unwrapped, and the appropriate reference points were selected in the unwrapped differential interferogram by using coherence (Hu et al. 2019).

By using the singular value decomposition method, a generalized inverse matrix of the coefficient matrix B can be obtained, so that the minimum norm solution of the data vector can be obtained. Finally, the speed for each period gives the amount of deformation for the period (Zhang et al. 2019).

The PyAPS (Jolivet et al. 2012) module in InSAR was used for implementing the weather model-based interferometric phase delay corrections. For a given dataset, we selected grid points overlapping with the spatial coverage of the SAR scene. Atmospheric variables were provided at precise pressure levels. Then, the delay function was computed on each of the selected grid points as a function of height. The basic formula for SBAS-InSAR can be expressed as

$$\begin{array}{c}\Delta {\varphi }_{ij}=\frac{4\pi }{\lambda }\left[\left(\frac{\Delta {r}_{ij}}{{L}_{ij}}-{\delta }_{ij}\right)\right]\end{array}$$
(3)

where \(\Delta {\varphi }_{ij}\) is the interferometric phase difference between two SAR images i and j, λ is the radar wavelength, \(\Delta {r}_{ij}\) is the change in distance between the two images along the line-of-sight direction, \({L}_{ij}\) is the perpendicular baseline between the two images, and \({\delta }_{ij}\) is the topographic phase contribution caused by the variation of the satellite elevation angle and the surface topography. Finally, we generated a 5-year time series map of topographic deformation and uplift changes in the target area. We selected the areas with a cumulative deformation of more than 80 mm as the region prone to experiencing thaw slumping.

The study employed the fourth (red) and eighth (near-infrared) bands of GF-1 imagery to compute the NDVI, which serves as a quantifiable measure of vegetation density. Areas with dense vegetation have high NDVI values, while barren areas have low values, approaching − 1 (Anyamba and Tucker 2005). Due to the occurrence of vegetation growth in the Beilu River Basin in summer, the utilization of NDVI becomes crucial for identifying potential slump areas within the study area. This study investigated the relationship between NDVI and thaw slumping using remote sensing data that include statistical data that are sensitive to thaw slumping. There is a significant correlation between NDVI values greater than 0.15 and thaw slumping, highlighting the influential role of vegetation cover in the initiation and progression of thaw slumps. This study selected the slope range of 3–10° as the area most prone to thaw slumping. We processed all data in both radar and geocoded domains and determined the geometric intersection of three input features. We exported layers that met the following criteria: deformation greater than 80 mm, NDVI values greater than 0.15, and overlapping features with slopes between 3 and 10°. This resulted in a new layer representing the suspected areas for thaw slumping.

3.2 Identification of Thaw Slumps Using Deep Learning Algorithms

This section provides a comprehensive overview of the process employed for thaw slump identification using deep learning. It presents the selection of the deep learning neural network model, its characteristics, and the process employed for model training. Three distinct backbone structures were simultaneously trained to determine the optimal model.

3.2.1 Bi-directional Cascade Network Used for Perceptual Edge Detection

The study area has significant fluctuations in terrain and elevation. To prepare for deep learning, fine geometric correction and orthorectification were performed on the original GF-1 remote sensing images, and the geographic coordinate system was converted to a projection coordinate system. The study area samples were manually marked by visual interpretation before using the images. The metadata format was set to categorical slices, and the data rotation angle was adjusted to 30° to capture more training samples from multiple image angles for data augmentation (Bo and Lane 2015). A bi-directional cascade network structure was used, where the output of each layer was supervised by scale-specific edge labels learned by the network itself. A structural equation module with an Atrous spatial pyramid pooling (ASPP) structure was employed to generate multi-scale features used to enhance the feature input of each layer (Lian et al. 2021). Finally, the multi-scale features were fused for deep learning. The ASPP consists of a 1 × 1 convolution, a pooling pyramid, and ASPP pooling (three layers), where the rate of each layer of the collection pyramid can be customized to achieve free multi-scale feature extraction (Qayyum et al. 2020).

In this study, we used the difference between Atrous and normal convolutional layers, which represents the dilation rate, to control the padding and dilation during the convolution process. By applying different padding and dilation rates, we were able to extract multi-scale information from the sensory fields. First, the input image was simplified to a 1 × 1 convolutional layer. Then, a merged pyramid was constructed, and void convolutional layers were superimposed on the given expansion coefficient rate to extract features of different scales. Finally, the ASPP layer was superimposed and output, and a convolved operation was applied to obtain the result. Figure 3 shows that the BDCN consists of multiple incremental detection (ID) blocks, each of which is learned through different supervisions inferred by a bi-directional cascade structure. The ID block is the basic component of the BDCN. Each ID block was trained with layer-specific supervision inferred by the bi-directional cascade structure to discover edges at the appropriate scale. The predictions of the ID blocks were then integrated into the result.

Fig. 3
figure 3

The overall architecture of a bi-directional cascade network (BDCN). The incremental detection (ID) block is a fundamental component of a BDCN, which is trained using layer-specific supervisions inferred by a bi-directional cascade structure. The purpose of this structure is to train each ID block to spot edges at the appropriate scale. The final prediction is obtained by fusing the outputs of the ID blocks

3.2.2 Training Model

To ensure accurate results, our experiment involved 20 forward computations and backpropagation iterations to fit and converge the data. We limited the number of training samples per iteration to two for optimal performance and speed. Additionally, we implemented an automatic stopping mechanism to prevent unnecessary iterations and improve efficiency. We also used frozen models, whereby only the head network was trained and the weights in the backbone layers remained in their original design, resulting in significant computational and time savings.

For our model, we employed the BDCN architecture described earlier, with transfer learning using preconfigured neural networks such as VGG16, ResNet101, and ResNet152. As the number of iterations and the optimization of the number of samples per batch increased, the number of weight update iterations increased, and the loss function curve moved from an open unfit state to the optimal fit state. We used the gradient descent method to optimize the learning rate and achieve rapid loss function descent and convergence.

4 Results

This section presents the experimental findings in a systematic manner. We initiated the process by extracting vulnerability characteristics to facilitate the mapping of hazard-prone regions. Subsequently, deep learning was employed to identify thaw slumping in the designated area, followed by the verification of recognition accuracy through the use of the intersection over union method.

4.1 Susceptible Thaw Slumping Areas

In the deep learning experiment, we used three band (RGB) images for deep learning to verify the accuracy of the thaw slumping landform samples obtained. However, using a single image to draw the thaw slumping landforms resulted in poor feedback and many false positives and negatives. This could be caused by the presence of many other landforms or disturbances, as well as erosional features such as collapsed ground and exposed soil. To address this issue, we also attempted to integrate multi-source data to generate a distribution map of areas with hidden dangers that were suspected to have experienced thaw slumping.

To improve the accuracy of extraction of thaw slumping, we extracted vulnerability characteristics to allow us to map areas prone to such hazards. The factors we considered included the slope of DEM inversion, NDVI, and long time series data regarding land surface deformation. By integrating multiple hazard-prone features, areas prone to slumping were extracted, and thaw slumping identification by deep learning was performed through these areas, achieving large-scale identification with a higher level of accuracy (Fig. 4).

Fig. 4
figure 4

Spatial distribution of the four factors relevant to thaw slumping in the study area. a Areas where the normalized difference vegetation index (NDVI) value exceeds 0.15, b areas where the deformation amount exceeds 70 mm, c areas with slopes of 3–10°, and d suspected landslide areas fused with multiple factors

For potential slump area 1 (Fig. 4a), areas with NDVI values above 0.15, indicating the presence of relatively dense vegetation, were chosen. For potential slump area 2 (Fig. 4b), areas with a total settlement of more than 70 mm, as indicated by settlement data obtained by SBAS-InSAR, were selected. For potential slump area 3 (Fig. 4c), the slope of each pixel was identified using DEM data, with slopes from 3 to 10° selected. For potential slump area 4 (Fig. 4d), areas of a suspected landslide were fused with multiple factors.

4.2 Accuracy of Thaw Slumpings Identification

After analyzing the training samples and the deep learning package, and assessing the training accuracy, we optimized the loss function by adjusting the learning rate α to improve the overall accuracy. We then used the deep learning package to automatically identify areas with thaw slumping in the study area. Nevertheless, since the automatically identified areas were not appropriately labeled or designated, we manually selected a typical thaw slumping area for visual interpretation, comprehensive analysis, and evaluation. The selected area was on a shady slope of the Beilu River Basin, which encompasses a very large landslide with an area of approximately 43,000 m2 (Fig. 5g). The specific location is shown as a yellow star indicating the location of landslide (A) in Fig. 1. The loss functions of the three neural networks during training and verification are shown in Fig. 5b, d and f.

Fig. 5
figure 5

Models identifying the contour of the same landslide: a ResNet101, c VGG16, and e ResNet152; Loss function diagram of each backbone model: b ResNet101, d VGG16, and f ResNet152, g GaoFen-1 thaw slumping image; and h Field survey photograph (by Huarui Zhang). The location of the specific landslide (A) is indicated by a yellow star in Fig. 1

The area depicted in Fig. 5g includes a large-scale landslide that is easily identified. During the field survey, the authors captured a site photo (Fig. 5h) and noted that the landslide has been ongoing for over 5 years, with its slow progress attributed to gravity. The three backbone models detected landslides in this area, but with differences in their recognition ranges due to varying accuracies. As illustrated in Fig. 5, ResNet101 exhibited the highest recognition matching degree, with an area overlap rate of 84.59%. The VGG and ResNet156 models had area overlap rates of 83.81 and 80.68%, respectively. However, determining the optimal model based on individual comparisons was inconclusive, and the overall area identified by the model must be compared to the self-determined suspected landslide area.

This study explored the effectiveness of three deep learning methods—VGG16, ResNet101, and ResNet152—in detecting and identifying the boundary of thaw slumping in the study area. The experiments were conducted 3 times, one for each of the three models, and suspected thaw slump regions were used to verify the accuracy of the methods. We calculated the intersection over union (IoU) (Nowozin 2014) value of each method to compare their performances. Additionally, we calculated the area of the thaw slumping region using both visual interpretation and automatic recognition methods. The IoU value was calculated as follows:

$$IoU\left(A,B\right)=\frac{Area\left(A\cap B\right)}{Area\left(A\cup B\right)}$$
(4)

where \(A\) represents the visual interpretation areas, and \(B\) represents the model-detected thaw slumping areas. The IoU value was calculated to evaluate the agreement between the automatic recognition area and the visual interpretation area of thaw slumping. The IoU value ranged from 0 to 1, with a value of 1 indicating complete overlap between the actual and verification areas. Results with an IoU value greater than 0.5 were considered reasonable. Table 1 shows that all three models achieved an IoU value greater than 0.5, with the ResNet101 model achieving the highest value of 0.85. This indicates that the verification area of this model highly overlaps with the actual area, making it the most effective method among the three.

Table 1 Intersection over union (\(IoU\)) values using varying combinations of training models

Table 2 presents the results of thawed collapses with an area larger than 1000 m2, which can be combined with the comprehensive analysis presented in Table 1. The results indicate that there was little difference in the three backbone models in terms of sample training accuracy, and their sample training data were effective. However, the IoU values of the models indicate that ResNet101 provides better coverage in the entire research range of data. Figure 6 illustrates the recognition of thaw slumping by the ResNet101 model.

Table 2 Interpretation of remote sensing data for thaw slumping
Fig. 6
figure 6

Thaw slumping detected by the ResNet101 model

5 Discussion

Deep learning methods offer significant advantages, particularly in their ability to automatically learn features. This stands in contrast to traditional automatic mapping methods that rely on artificially designed features, which require considerable data and expertise to design and to select the best combination of methods. It is important to note that the specific mechanisms and factors contributing to thaw slumping can vary depending on local geology, geomorphology, climate conditions, and land use practices. Therefore, a detailed study of the Beilu River Basin study area would provide more site-specific information about the development of thaw slumping in that particular region.

In this study, we focused on preparing training images and the corresponding network architecture for verification in line with BDCN requirements. However, we did not optimize the accuracy of the design feature range or fusion of multi-hidden areas, which could be done in later research. This highlights the black-box nature of deep learning, where the learned features are unknown, and the output factors are uncontrollable. While our three models achieved high validation accuracy, they also generated many false positives when identifying thaw slumping. Therefore, it was necessary to screen the identification results by reviewing suspected hidden danger areas. Unfortunately, the deep learning method used in this study also did not achieve full coverage of thaw slumping.

To overcome this limitation, we used different neural network configurations with the same training data, resulting in similar recognition results. Although the overall accuracy was not significantly different, ResNet101 proved to be the best model in terms of the integrity of landslide extraction and IoU. The advantages of ResNet101 will be particularly useful in future efforts to increase the range of automatic extraction.

Improving the identification of thaw slumping hazards is crucial for preventing future disasters and protecting human lives and property. From a theoretical perspective, in the future, researchers can focus on developing more accurate and efficient methods for identifying thaw slumping incidents. For instance, they can explore the use of remote sensing techniques such as using satellite imagery, light detection and ranging, or aerial photography to detect changes in the land surface and monitor the progress of thaw slumping. Additionally, deep learning algorithms was employed by researchers to improve the accuracy of landslide detection. This approach can help automate the process of identifying potential slumping areas and provide early warnings to prevent or mitigate disasters.

6 Conclusion

This study utilized deep learning algorithms to automatically map areas susceptible to thaw slumping using high-resolution remote sensing images. To quantify deformation rates in the Beilu River Basin of the Qinghai–Tibet Plateau in China, this study introduced SBAS-InSAR technology. Integration of slope and NDVI data enabled the identification of suspected thaw slumping regions. Employing three distinct neural networks, intelligent recognition of areas with potential thaw slumping was executed. Comparative analysis pointed out the most efficient depth model for landslide identification. A comprehensive thaw slumping distribution map was generated for the study area. Despite the need for further enhancement in recognition accuracy, our approach successfully delineated thaw slumping areas in the Beilu River Basin. Among the backbone models, the ResNet101 model within the BDCN framework exhibited the highest recognition accuracy. This, coupled with pivotal factors such as slope, vegetation, and deformation, significantly contributed to the efficacy of image-based thaw slumping analysis. This work underscores its potential by demonstrating the feasibility of employing deep learning algorithms for the automatic mapping of thaw slumping landforms based on remote sensing imagery.