Abstract

A synthetic aperture radar (SAR) automatic target recognition (ATR) method is developed based on the two-dimensional variational mode decomposition (2D-VMD). 2D-VMD decomposes original SAR images into multiscale components, which depict the time-frequency properties of the targets. The original image and its 2D-VMD components are highly correlated, so the multitask sparse representation is chosen to jointly represent them. According to the resulted reconstruction errors of different classes, the target label of test sample can be classified. The moving and stationary target acquisition and recognition (MSTAR) dataset is used to set up the standard operating condition (SOC) and several extended operating conditions (EOCs) including configuration variants, depression angle variances, noise corruption, and partial occlusion to test and validate the proposed method. The results confirm the effectiveness and robustness of the proposed method compared with several state-of-the-art SAR ATR references.

1. Introduction

With the publication of the famous moving and stationary target acquisition and recognition (MSTAR) dataset, automatic target recognition (ATR) of synthetic aperture radar (SAR) images has drawn intensive attention from researchers all over the world [1]. As a supervised classification problem, the way of building the references plays an important role in the recognition methods. Accordingly, the existing SAR ATR methods can be divided as template-based and model-based ones. The former way stored SAR images of interested targets from different conditions, e.g., view angles, backgrounds, and resolutions, to establish the template set. For the test sample, it is compared with the templates from different classes to determine the target label. The semi-automated image intelligence processing (SAIP) program [24] provided an embryo for template-based SAR ATR methods, which used the correlation filters to evaluate the similarities between the test and template samples. The model-based way describes the interested targets using CAD models, global scattering center models, etc. The MSTAR program [57] started the research of model-based SAR ATR, in which the CAD models were processed by high-frequency electromagnetic calculation tools to predict target signatures at different views, backgrounds, etc. Later, the parametric global scattering center models were built in inverse and forward ways to potentially replace the complex CAD models [812], and some recent works based on the global scattering center model confirmed the validity [1315].

Similar to traditional pattern recognition issues, e.g., face recognition and fingerprint recognition, a concrete SAR ATR algorithm (either in template-based or model-based ways) can be generally described as a two-phase procedure, which performs feature extraction and classification sequentially. In the phase of feature extraction, various kinds of features are used to describe the target characteristics including the geometrical shape, intensity distribution, and electromagnetic scattering. Target region, contour, and shadow are typical geometrical features. In [1620], the Zernike (including modified ones), Chebyshv, and Krawtchouk moments were used to depict the target regions. Park et al. designed several discrimination features based on the binary regions [21]. Ding et al. proposed a region matching method for SAR target recognition [22], which was further improved by Cui et al. by introducing the Euclidean distance transform [23]. The dominant scattering area was generated, which recorded the locations of strong scattering centers, and processed by morphological filters for SAR target recognition [24]. Anagnostopoulos used the elliptical Fourier series (EFS) to describe the distribution of target outlines [25]. Zhu et al. used the Gauss mixture model (GMM) to model the outline points [26]. A partial outline matching algorithm was developed in [27] to properly handle the occlusion problems in SAR ATR. Papson and Narayanan applied target shadow to SAR ATR and demonstrated its validity [28]. Chang and You constructed the information-decoupled components based on target region and shadow [29]. The intensity distributions of SAR images were usually described by transformation features by mathematical projection or signal processing techniques. Mishra used (kernel) principal component analysis ((K) PCA) and linear discriminant analysis (LDA) for SAR image feature extraction and target recognition [30, 31]. The nonnegative matrix factorization (NMF) was adopted by Cui et al. for SAR ATR [32]. Manifold learning algorithms using local embeddings were also validated in SAR target recognition with good performance [3335]. The signal processing algorithms including wavelet analysis [36, 37], monogenic signal [38, 39], and bidimensional empirical mode decomposition (BEMD) [40] were validated useful for SAR feature extraction. Dong et al. decomposed SAR images based on the monogenic signal and classified the multiscale features for SAR ATR [38, 39]. Chang et al. developed a SAR ATR method based on BEMD [40]. Different from optical images, SAR images contain the electromagnetic characteristics of the targets such as scattering center [4148] and polarizations [40, 50]. In [4245], the attributed scattering centers were used for SAR target recognition based on different matching algorithms. In addition, they were validated robust to noises either in the high-resolution range profiles (HRRP) [47] or SAR images [48]. When going to the phase of classification, the decision rules are developed for the extracted features. In general, most of present SAR ATR methods directly made use of achievements in the field of pattern recognition, including the nearest neighbor (NN) [30], support vector machine (SVM) [51, 52], adaptive boosting (AdaBoost) [53], and sparse representation-based classification (SRC) [5456]. In [51], SVM was first used for SAR ATR by Zhao and Principe, which became the most prevalent classifier in this field afterwards. Sun developed the AdaBoost for SAR ATR based on the traditional boosting technique [53]. With successful applications in face recognition [54], Thiagaraianmet al. introduced SRC to SAR target recognition [55]. To process more than one sparse representation problems, the multitask sparse representation was used to classify the multiple views, features, resolutions, etc. [5764]. Zhang et al. developed the multiview SAR ATR method based on the joint sparse representation, which was further enhanced in [57]. Liu and Yang classified the features extracted by PCA, KPCA, and NMF using the multitask compressive sensing [60]. The multiresolution representations were generated and classified by joint sparse representation by Zhang [62]. The deep learning theory stirred a surge in the pattern recognition field [6567]. In [68], Zhu et al. conducted a survey of the deep learning methods in remote-sensing applications including SAR ATR. Among all these deep learning-based SAR ATR methods, the convolutional neural network (CNN) was the mostly used method. A simple but effective CNN was developed in [71] with good performance. The all-convolutional network (A-ConvNet) was developed by Chen et al., which greatly enhanced the training efficiency and classification accuracy as reported in [72]. The famous Res-Net was modified for SAR ATR in [73]. With fast progress in deep learning, more networks were applied into SAR ATR including the enhanced squeeze and excitation network (ESENet) [74], gradually distilled CNN [75], cascade coupled CNN [76], multistream CNN [77], and generative adversarial network (GAN) [78]. In addition, some networks were specifically developed to handle nuisance conditions such as noise corruption and rotation. [79, 80]. As a data-driven classifier, the performance of CNN is highly related to the amount and coverage of the available training samples. In this sense, some works used the transfer learning and data augmentation algorithms to enhance the classification capability of the trained networks [8185]. Considering the various extended operating conditions (EOC) [7] in SAR ATR, the adaptivity of deep learning methods may sometimes be limited. As a remedy, they were combined with other classifiers to enhance the final performance. Wagner combined CNN with SVM, in which SVM performed as the output classifier [86]. Kechagias-Stamatis et al. fused CNN with the sparse coding to combine their merits [87]. Cui et al. updated CNN by SVM to enhance the performance under limited training samples [88]. A hierarchical decision fusion strategy was developed in [89] to fuse the decisions from CNN and attributed scattering center matching.

In this study, the two-dimensional variational mode decomposition (2D-VMD) is used to extract features for SAR images with application to SAR ATR. Dragomiretskiy and Zosso first developed the variational mode decomposition (VMD) [90] in 2014. As an adaptive and nonrecursive signal decomposition algorithm, VMD was validated to achieve better effectiveness and robustness than similar algorithms such as wavelet analysis and EMD. [9193]. Specifically, the related works demonstrated that the VMD algorithm is less sensitive to noise corruption than the EMD-based decomposition ones because Wiener filtering is used to update the decomposed components directly in the Fourier domain. Further, the two authors extended VMD to two dimensions, thus resulting in the 2D-VMD [94], which could directly decompose 2D matrices such as images [9597]. After the decomposition, the original images are represented by multiscale modes reflecting separate spectral bands, which have specific directional and oscillatory characteristics. In this sense, the decomposed components from 2D-VMD could effectively describe the rich time-frequency properties of the objects in original images and thus help the interpretation. This paper developed the SAR ATR method based on the features extracted by 2D-VMD. The original SAR image and the decomposed components by 2D-VMD are jointly represented based on multitask sparse representation [98, 99]. The decompositions and original image are actually correlated, so the multitask sparse representation could effectively improve the overall reconstruction precision. Finally, based on the reconstruction errors, the target label can be determined. The MSTAR dataset is used to set up different experimental conditions including the standard operating condition (SOC) and EOCs to investigate the performance of the proposed method. The results confirmed its effectiveness and robustness.

The remaining parts of this paper are organized as follows. Section 2 introduces the basics of 2D-VMD and its reasonability for SAR image feature extraction. Section 3 describes the principle and procedure of the proposed method. Experiments are conducted in Section 4 to evaluate the proposed method while comparing them with several state-of-the-art SAR ATR methods. Conclusions are drawn in Section 5 based on the quantitative results and analysis.

2. Two-Dimensional Variational Mode Decomposition (2D-VMD)

A natural signal can always be decomposed as component from different frequencies. Similarly, 2D signals such as images can also be decomposed into multiple frequential components, which could reflect different shapes or orientations. In the previous works, several image decomposition algorithms were developed including wavelet analysis, monogenic signal, and BEMD. VMD was proposed by Dragomiretskiy and Zosso, which is capable of decomposing a multicomponent signal into several narrow-band components with specific sparsity properties [90]. As reported, VMD could achieve better effectiveness and robustness than wavelet, EMD, etc. [9193]. 2D-VMD is a general extension of VMD [94], which was developed for processing images [9597]. With K components to be decomposed, the 2D-VMD is formulated as a constrained variational problem as follows:where denotes the 2D analytic signal of the kth mode with a single-sidedness spectrum property ans represents a reference direction vector in the frequency domain, which separates the spectrum plane into two half-planes. Therein, one half-plane was set to zero. is the input image. The objective function aims to minimize the summation of the modes’ bandwidths as the squared L2-norm of the gradient related to its directional 2D analytic signal in only half-space frequencies.

To render the problem unconstrained, a quadratic penalty and a Lagrangian multiplier are used to enforce the constraint fidelity. The augmented Lagrangian is reformulated as follows:

So, the unconstrained problem iswhere and are the Lagrangian multiplier and balancing parameter of the data-fidelity constraint, respectively, is the set of intrinsic modes, and includes the center frequencies related to the corresponding modes. The alternate direction method of multipliers (ADMM) can be adopted to solve the problem in equation (2). In the frequency domain, the mode estimated from the optimization problem represents a Wiener filter updated as follows:where , , and correspond to the Fourier transforms of , , and , respectively, and is the iteration time. The center frequencies are updated by a similar way with that of VMD, but the path of integration is the half-plane :where actually represents the first moment of the mode’s power spectrum on the half-plane . The Lagrangian multiplier is updated using a standard gradient ascent with a fixed time step as follows:

Based on the above steps, the input image can be decomposed into multiple components from different frequencies. More details of the 2D-VMD algorithm and implementations are in [94].

In this study, the 2D-VMD is used for feature extraction of SAR images. Figure 1 shows the decomposition results of a MSTAR SAR image with the figures of the original image and first three modes. Compared with the original image, the decomposed components share some similarities with it. In addition, they can intuitively reflect the multidirectional properties of the target, which provide complementary descriptions for the target including the global and local ones. Hence, by collaboratively using the original SAR image and its components decomposed by 2D-VMD, more discriminative information is available for correct target recognition.

3. Proposed Method

3.1. Multitask Sparse Representation

The multitask sparse representation considers several related sparse representation problems together, which could produce more precise solutions than solving these tasks separately [5764]. The previous works widely used this tool to handle the multiple views, features, etc., from SAR images for target recognition. In this article, the multitask sparse representation is used to classify the components decomposed by 2D-VMD. Generally, for the M components from the test sample , denoted as , they can be unifiedly considered as follows:where denotes the dictionary corresponding to the component and stores the sparse coefficient vector in a matrix.

The formulation in equation (7) makes little use of the relationship between different components, so they can hardly obtain the optimal solutions. According to previous research studies, the inner correlations of different components can be effectively exploited by imposing the norm on the coefficient matrix . Accordingly, the optimization problem is reformulated as follows:where is the regularization parameter larger than zero.

With the constraint of the norm, the sparse coefficient vectors of different components are forced to share similar distributions, indicating the locations of the nonzero coefficients. In this sense, the inner correlations are used during the solutions, which help to improve the overall precision. For the above optimization problem, the simultaneous orthogonal matching pursuit (SOMP) [98] and multitask compressive sensing [99] can be used. With the optimal estimation of the coefficient matrix , the reconstruction error of each training class is calculated separately and the target label is classified as the one with the minimum error:where extracts the dictionary of the th component related with the class and represents the corresponding coefficient vector.

3.2. Target Recognition

Based on the above analysis, the procedure of the proposed method is summarized and intuitively illustrated as Figure 2. Based on the observations in Figure 2, we choose the first three components from 2D-VMD in the classification stage, which are together used with the original image. For the training samples, they are decomposed based on 2D-VMD to build the separate dictionaries. The test sample is first decomposed by 2D-VMD in the same way with the training samples. Afterwards, the decomposed components and original image are jointly represented by multitask sparse representations. Finally, the reconstruction error of each training class is calculated. All the reconstruction errors are compared to form a final decision on the target label.

The original image is used in the classification phase for the following two points. First, as shown in Figure 1, the decomposed components could complement the descriptions of the original image, but they still lack some information in the original image. Second, as illustrated in Figure 2, only three components from 2D-VMD are used, so they could hardly reconstruct all the properties in the original image. Therefore, by considering the original image and its decomposed components by 2D-VMD, the discriminative information in the original image can be better exploited to enhance the classification performance.

4. Experiments

4.1. Dataset and Reference Methods

Typical experimental conditions are setup based on the MSTAR dataset to test and validate the proposed method. The dataset comprised SAR images measured by using X-band radar under the spot mode, which have the resolution of 0.3 m over both the range and cross-range directions. There are ten targets included in the dataset as shown in Figure 3 with thousands of SAR images. Specifically, some targets, e.g., BMP2 and T72, contain several different configurations, which have structural modifications. For each target (configuration), its SAR images cover the azimuth of 0°–360° and every two consecutive samples have an azimuth difference of 1°–2°. The depression angles of 15° and 17° are available for all the ten targets, among which some have others such as 30° and 45°. Therefore, the rich set of MSTAR images provide good candidates for the validation of SAR ATR methods, which can be comprehensively evaluated under both SOC and EOCs.

The proposed method is compared with some reference methods from published works simultaneously during all the experiments. Table 1 briefly reviews the basic properties of the reference methods including the features and classifiers. NMF, Mono, and BEMD are feature-based methods, which applied different types of feature extraction algorithms in order to enhance the classification performance. A-CovnNet and Res-Net are featured as classifier-based methods, which mainly differed in the architectures of the networks. In addition, they are chosen as the representatives of deep learning methods, which have been the main stream in SAR ATR. The following tests are conducted under SOC and EOCs sequentially from simplicity to complexity. The classification results are displayed and discussed to form reliable conclusions.

4.2. 10-Class Problem under SOC

A typical SOC experimental setting is displayed as Table 2 including all the ten targets. The training samples are SAR images from 17° depression angles while those from 15° are used for classification. Therein, only single configurations of BMP2 and T72 (as specified by the serial numbers (SN)) are used in this experiment. Hence, the test samples are assumed to share high similarities with the training ones with only a 2° depression angle variance. The 10-class test samples are classified by the proposed method to obtain the confusion matrix as Figure 4, in which the X and Y labels correspond to the actual and predicted classes, respectively. So, the diagonal elements in Figure 4 record the classification accuracies of different targets, which are all higher than 99%. We define the average recognition rate as Pcr, which equals the proportion of correctly classified ones in all the test samples. Accordingly, Pcr of the proposed method is calculated to be 99.57%. Table 2 compares the proposed method with reference methods under SOC. Compared with NMF, Mono, and BEMD, the proposed method achieves a higher Pcr, which confirms the superior validity of BVMD components. The A-ConvNet and Res-Net obtain the approaching performance with the proposed method because of the high classification capability of deep learning models. By multitask sparse representation of the BVMD components, the proposed method achieves the highest Pcr among all the methods, validating its effectiveness under SOC.

4.3. Configuration Variants

Different configurations of the same target have structural modifications, which will be reflected in SAR images. In this sense, the EOC of configuration variants is caused by the variations of target itself. Table 3 sets the training and test samples under configuration variants involving four targets but only two of them are classified. The test samples of BMP2 and T72 are different from their training ones. The BDM2 and BTR70 are inserted in the training set to enhance the classification difficulty. The detailed recognition results of the proposed method are shown in Table 4. The test samples from different configurations have some misclassifications to BDRM2 or BTR70. Each configuration from BMP2 and T72 can be classified with an accuracy of over 98% and Pcr (%) of the proposed method is calculated to be 98.76%. Pcr of different methods are displayed in Table 5 for comparison. With the highest Pcr, the proposed method is demonstrated with superior robustness to configuration variants. The better performance over NMF, Mono, and BEMD shows the validity of BVMD for SAR image feature extraction. Because of the larger differences between the test and training samples than SOC, the performance of CNN-based methods including A-ConvNet and Res-Net degrades more significantly.

4.4. Depression Angle Variances

When the test and training samples are from different depression angles, they have differences in the image domain because of the sensitivity to view angles. When the depression angle variances are relatively large, the recognition problem becomes tough. Table 6 presents the experimental setup under depression angle variances. Therein, the training samples comes from 17° depression angle, while the test samples include two subsets from 30° to 45°, respectively. The proposed method is evaluated under both depression angles and Pcrs are obtained as 98.12% and 74.24%, correspondingly. Figure 5 stems Pcrs of different methods under 30° and 45° depression angles. In general, the performance at the 30° depression angle is much better than that at 45° because of the smaller variance. Although degraded at the 45° depression angle, the relative predominance of the proposed method becomes more remarkable compared with the situation at the 30° depression angle. With the highest Pcrs at both the depression angles, the proposed method achieves better robustness to depression angle variances than the reference methods. The results reflect the effectiveness of BVMD features for handling the EOC of depression angle variances, which are superior over those extracted by NMF, monogenic signal, and BEMD.

4.5. Noise Corruption

When the test and training samples are with different noise levels, they also have many differences. Usually, the training samples are preprocessed to relieve the noises, so they are assumed to have high signal-to-noise ratios (SNR). However, the test samples may contain high levels of noises. To test the proposed method under noise corruption, the noisy test sets are simulated by adding different levels of noises into the test samples in Table 7. In detail, the noises are generated according to the energy of the original SAR image and desired SNR. Then, these noises are added into the original SAR image to get the noisy one with a special SNR. Afterwards, all the methods were evaluated at different levels, and the results are plotted as Figure 6. In comparison with former experiments, the performance of A-ConvNet and Res-Net experiences much significant decrease than the remaining methods. Pcr of the proposed method peaks at each SNR, showing its best robustness under noise corruption. As stated earlier, 2D-VMD has better robustness to noises in comparison with traditional image decomposition algorithms. Compared with NMF, Mono, and BEMD, the better performance of the proposed method mainly comes from the high effectiveness of the BVMD features.

4.6. Partial Occlusion

When the targets are occluded in the test samples, the recognition problem becomes much more complex. To handle this EOC, some previous works were developed using partial matching algorithm or occlusion-robust features. To test the proposed method under partial occlusion, the occluded test sets are first simulated based on the test samples in Table 7 according to the empirical models in [44, 84]. In detail, a certain proportion of the target region in SAR images are removed and replaced by the randomly picked background pixels. By varying the proportion of the occluded region, the occluded samples at different levels can be generated. The recognition results of different methods are plotted as shown in Figure 7. Similar to the case of noise corruption, the proposed method outperforms the reference methods at each occlusion level. The CNN-based methods achieve lower robustness than Mono, BEMD, and proposed method because of the large divergences between the training and test samples especially at high occlusion levels. The results show the better robustness of the proposed method to partial occlusion than the reference methods.

5. Conclusion

This paper develops a SAR ATR method based on 2D-VMD. The discriminative components are decomposed from original SAR images via 2D-VMD, which provides multiscale, time-frequency descriptions of the targets. The multitask sparse representation is performed in the classification phase to jointly represent the original image and its 2D-VMD components. Based on the reconstruction errors, the target label of the test sample is decided. Experiments are conducted on the MSTAR dataset under SOC and typical EOCs including configuration variants, depression angle variances, noise corruption, and partial occlusion. Based on the experimental results, conclusions can be drawn as follows: (1) Under SOC, the proposed method achieves Pcr of 99.57% for ten classes of targets, which is higher than those of the four reference methods. (2) Under configuration, the proposed method obtains Pcr of 98.76% for the six different configurations from BMP2 to T72. Compared with the reference methods, the proposed one has better robustness. (3) Under depression angle variances, the proposed method achieves Pcrs of 98.12% and 74.24% at 30° and 45° depression angles, respectively, which outperforms the reference methods. (4) Under noise corruption and partial occlusion, Pcr of the proposed method peaks at each noise or occlusion level, validating its superior robustness over the reference methods. Overall, the proposed method could effectively improve SAR ATR performance under both SOC and EOC so has much potential in the future uses.

Data Availability

The dataset used to support the findings of the study is publicly available.

Conflicts of Interest

The authors declare that they have no conflicts of interest.