Enhanced Coalescence Backdoor Attack Against DNN Based on Pixel Gradient

Yin, Jianyao; Chen, Honglong; Li, Junjian; Gao, Yudong

doi:10.1007/s11063-024-11469-4

Enhanced Coalescence Backdoor Attack Against DNN Based on Pixel Gradient

Open access
Published: 19 March 2024

Volume 56, article number 114, (2024)
Cite this article

You have full access to this open access article

Neural Processing Letters Aims and scope Submit manuscript

Jianyao Yin¹,
Honglong Chen¹,
Junjian Li¹ &
…
Yudong Gao¹

143 Accesses
1 Altmetric
Explore all metrics

Abstract

Deep learning has been widely used in many applications such as face recognition, autonomous driving, etc. However, deep learning models are vulnerable to various adversarial attacks, among which backdoor attack is emerging recently. Most of the existing backdoor attacks use the same trigger or the same trigger generation approach to generate the poisoned samples in the training and testing sets, which is also commonly adopted by many backdoor defense strategies. In this paper, we develop an enhanced backdoor attack (EBA) that aims to reveal the potential flaws of existing backdoor defense methods. We use a low-intensity trigger to embed the backdoor, while a high-intensity trigger to activate it. Furthermore, we propose an enhanced coalescence backdoor attack (ECBA) where multiple low-intensity incipient triggers are designed to train the backdoor model, and then, all incipient triggers are gathered on one sample and enhanced to launch the attack. Experiment results on three popular datasets show that our proposed attacks can achieve high attack success rates while maintaining the model classification accuracy of benign samples. Meanwhile, by hiding the incipient poisoned samples and preventing them from activating the backdoor, the proposed attack exhibits significant stealth and the ability to evade mainstream defense methods during the model training phase.

Detecting and Mitigating Backdoor Attacks with Dynamic and Invisible Triggers

A Random Multi-target Backdooring Attack on Deep Neural Networks

Bag of tricks for backdoor learning

Article 05 April 2024

1 Introduction

Deep Neural Network (DNN) is a recently developed technique that derives from machine learning and has been applied to different fields of research, production, and daily life. It has been found effective in scenarios like face recognition [1], autonomous driving system [2], voice recognition [3], and image generation [4], etc. The DNN is also applicable in Internet of Things (IoT) [5, 6], Edge-computing [7], and Crowdsensing [8] system and can substantially increase their efficiency and functionality. However, its vulnerability to possible threats from multiple adversarial attacks [9, 10] has also attracted considerable attention. Backdoor attack [11] is one of the threats that emerged recently. The overall procedure of a backdoor attack is demonstrated in Fig. 1. The attacker first defines a backdoor trigger, selects a target class, superimposes the trigger onto a benign sample, and modifies its label to the target class, thus generating a poisoned sample. The attacker may generate many poisoned samples, and mix them into the training dataset. When the victim trains a DNN model using the poisoned dataset, a backdoor will be implanted in the model. A backdoor model behaves normally on clean inputs but exhibits an adversarial feature by classifying any sample stamped with the trigger into the target class. To achieve this, only 1$\%$ of training samples need to be modified, and the trigger only occupies a minute area of the sample [12]. Hence, backdoor attack also achieves high stealthiness and is exceedingly difficult for human to identify.

Backdoor attack can cause massive immediate attention because it is a realistic threat to machine learning in almost every scenario. Training a DNN model requires numerous samples, but the data collection and annotation processes are labor-intensive, and many individuals and companies prefer to outsource it to a third party. In this case, the poisoned samples might be injected into the dataset. On the other hand, since model training is usually computationally expensive, it may also be outsourced to the cloud, such as Google’s Cloud Machine Learning Engine [13] and Azure Batch AI Training [14]. Such services are nowadays called “machine learning as a service” (MLaaS). In addition, instead of training a DNN model from the beginning, it is possible to fine-tune a model that has been well-trained for another task. This technique, called transfer learning, can significantly reduce the model training time and computational resources required by the training process. However, if there is a backdoor in the model, likely, the backdoor will still exist after fine-tuning [15].

Once the backdoor threat is proposed, various specific defense methods are developed. Different perspectives on backdoor attack, including dataset and model, have been studied, and all these defense methods have successfully mitigated the threat of backdoor attacks. However, due to the black-box nature of DNN, existing defenses can only exploit the superficial characteristics of backdoor attack and can therefore be bypassed by well-designed advanced attacks.

In this paper, we reveal that most of the defenses assume that all the poisoned samples involved in an attack are generated by the same trigger (or pattern). Thus, the poisoned samples in the training dataset are constructed by the same trigger (or pattern) as the poisoned samples in the testing dataset. Based on this assumption, the defender can use the poisoned samples in the training dataset to activate the backdoor, and identify the backdoor by analyzing the prediction of the model [16, 17].

To demonstrate the limitation of existing defense methods, we explore two new types of backdoor attacks called enhanced backdoor attack (EBA) and enhanced coalescence backdoor attack (ECBA). They are inspired by a feature of DNN where the prediction process of the model relies on the difference between pixels rather than the absolute value of pixels [18]. The idea of EBA is that the trigger used to generate poisoned samples for training can be different from the trigger used to generate poisoned samples for testing. We can design a less significant trigger to train the backdoor model and use an enhanced trigger to activate the backdoor. The advantage of this specific design is: When a less significant trigger is used during training, the model is still able to learn the backdoor pattern, but the backdoor is not sensitive enough to be activated by the less significant trigger. However, it can be activated by samples with enhanced trigger. This difference not only amplifies the stealthiness of the backdoor embedding process, as the trigger is more difficult to detect, but also allows the attack to evade multiple defenses. On the other hand, Xue et al. [19] proposed an N-to-one backdoor attack by defining multiple triggers used in the model training process and concentrating them to form the trigger for testing. This elaborate design achieves a similar objective but has a fatal flaw in that it is extremely sensitive to the number of poisoned samples in the training dataset. Therefore, we combine the EBA with the N-to-one backdoor attack to obtain the ECBA. This attack maintains all advantages of the N-to-one attack and is more robust than it.

The main contributions of this paper are as follows:

1.
To indicate the weaknesses of existing defense methods, we propose the enhanced backdoor attack, which can bypass the existing defense and is more stealthy.
2.
In addition, we propose the enhanced coalescence backdoor attack, which exhibits a considerably higher “attack success rate difference" than the N-to-one attack while maintaining its advantages of hiding the poisoned samples in the training set and evading AC [16] and NC [20] defense methods.
3.
Extensive experiments are conducted to test the effectiveness of our attack, including the effectiveness across various datasets and model structures, robustness to perturbations during data collection, and the ability to evade multiple backdoor defense methods.

The rest of this paper is organized as follows. Section 2 reviews the existing works on backdoor attacks and backdoor defense methods, and then analyzes the weaknesses and flaws of existing defense methods which we exploit in our attack scheme. Section 3 demonstrates our attack schemes in detail, and Sect. 4 presents the experiment results of our attacks, including performance comparisons with Badnets and N-to-one attack, as well as bypassing existing defense methods. Finally, Sect. 5 summarizes the conclusions of this paper.

2 Related Work

Existing works can be divided into two categories, which are backdoor attack and backdoor defense respectively. Researches on backdoor attack focus on finding more covert and invisible attack pattern, increasing the attack success rate, and improving effectiveness when applied to the real world. Researches on backdoor defense, on the other hand, focus on detecting backdoor model, identifying targeted classes, and eliminating or mitigating the impact of backdoor attack.

2.1 Backdoor Attack

The backdoor attack is first proposed by Gu et al. [11], who showed that a maliciously trained model can exhibit schismatic behavior when encountered with clean inputs and poisoned inputs. This work is called BadNets and is performed by stamping a trigger (e.g., changing one certain pixel to color white or overlaying a small picture on the right button corner) on benign samples. After that, various backdoor attacks targeting different stages of the model training process are revealed.

Backdoor attack based on poisoning the training dataset is one of the primary research objects at present, and several attack methods have emerged. Chen et al. [21] proposed two types of backdoor attack, which are single-instance-key attack that aims to mislead the model to misclassify any sample of a certain person as another, and pattern-key attack which makes the model misclassify any sample with that pattern. Barni et al. [22] proposed a backdoor attack that requires only modifications to training samples without changing the corresponding labels. Lavisotto et al. [23] presented a realistic application of backdoor attack on biometric systems, indicating that if unguarded, the attack can be applied to almost every machine learning scenario. Liu et al. [24] proposed a block-box attack aims at decrease the usability of a DNN model. The attacker uses enhanced conditional DCGAN to synthesize poisoned samples and adapts asymmetric vector to relabel them. Since the above attacks are using a visible pattern as trigger, they can be easily distinguished by human, and multiple defense methods are proposed to detect and eliminate similar backdoors and poisoned samples. Therefore, many of the recent subsequent works related to attack strategy have focused on making the trigger invisible and evading the existing defense methods. Xue et al. [19] proposed two types of backdoor attacks named one-to-N attack and N-to-one attack. The one-to-N attack uses the trigger with one pattern to activate multiple backdoors, while the N-to-one attack uses several triggers to activate one backdoor, and escape from many kinds of defense methods. Liao et al. [25] proposed the first invisible backdoor attack by superimposing a shallow watermark as the trigger, achieving both a high attack success rate and imperceptibility to human. Zou et al. [26] proposed methods to insert a single or multiple neural level trojan in neural networks and achieved high stealthiness since nothing except an image with the trigger can activate the trojan(or backdoor). Wang et al. [27] proposed an invisible backdoor trigger based on existing biology literature on the human visual system, which assumes that the human eye is insensitive to small chromatic aberrations. The trigger of this attack requires quantization and dithering of the sample, making it imperceptible to both human and defense methods.

On the other hand, Tang et al. [28] suggest that there is another way to implant the backdoor by directly modifying the model. Liu et al. [29] designed a backdoor by firstly inverting the model to generate an initial trigger, and later fine-tuning the model with extra data stamped with the previously extracted trigger. This attack is particularly powerful when the attacker is an MLaaS provider. If the attacker has access and opportunity to manipulate both the dataset and training process(e.g. third parties that offer model training service), a more elaborately designed backdoor attack can be applied. Zhong et al. [30] proposed an imperceptible backdoor trigger that uses the U-net to generate sample-specific triggers. Moreover, the U-net and backdoor model are trained simultaneously and the loss function of the backdoor model is modified to optimize the chance of embedding the backdoor.

2.2 Backdoor Defense

Immediately after the discovery of the backdoor attack, various backdoor defenses have been proposed. Although DNNs are like a black box and there is no systematic explanation of how DNNs make predictions, researchers can detect backdoors from different perspectives. The fine-pruning method proposed by Liu et al. [15] assumes that there are some neurons responsible for backdoor function in the DNN model, and these neurons can only be activated by poisoned samples. So, if we disable the neurons that are constantly inactive when clean samples are submitted to the model, then the backdoor is likely to be disabled. On the other hand, Chen et al. [16] investigated the differences between the last hidden layer of the benign model and that of the backdoor model, and found that poisoned samples indeed activate a different group of neurons, which can be exploited to identify target class and distinguish poisoned samples. Additionally, Soremekun et al. [17] proposed a similar defense designed for a robust model training process called AEGIS.

Since previous works tended to use the smallest possible triggers, a defense aiming to find the smallest “trigger" for each class is proposed by Wang et al. [20]. This work can identify target classes and reconstruct trigger patterns without access to the poisoned training dataset, and significantly increase the utility of this defense method. Selvaraju et al. [31] proposed an algorithm for visualizing the area in an image that a model pays the most attention to during the model classification process. Dong et al. [32] proposed another gradient-free defense method based on reverse engineering, called B3D, which operates with limited access to the model and no access to the training dataset.

If the defender has additional data that is guaranteed to be clean, then a knowledge-distillation-based method proposed by Yoshida et al. [12] can be applied. This method uses the additional clean data to extract “clean knowledge" from the backdoor model and train a completely new distillation model. Gao et al. [33] proposed the STRIP method, which exploits the differences in the classification process between clean and poisoned samples. The former is based on most of the pixels in the sample and is susceptible to interference when stacked with other samples, while the latter is based only on the trigger area and will exhibit constant prediction results when stacked with other samples.

However, the black box nature of DNNs dictates that any attempt to find an overall solution to backdoor attack is unrealistic by now, which means that every defense method proposed has its advantages and weaknesses. Fine-pruning [15], NC [20], and B3D [32] can detect any backdoor using a simple trigger pattern, but fail to detect backdoors using a sophisticated trigger pattern. And recent attacks [27, 30] also suggest that it is powerless when the attacker has access to the model training process. AC [16] and AEGIS [17] rely on extracting the last hidden layer to analyze different behavior between clean samples and poisoned samples when submitted to the model. However, these methods also cannot handle attacks involving modification of the loss function of the model training process. GradCAM [31] demonstrates the image regions that the DNN model is mostly concerned with during classification, but this method can only detect small size trigger and is bypassed by sample-specific triggers or triggers covering the entire image. STRIP [33] is also powerless to face sample-specific triggers. Distillation-based method [12] seems to be a solid defense method, as it is designed to only extract the clean knowledge from the backdoor model. However, as with Fine-pruning [15], this method requires additional clean samples that follow the same sample distribution as the training set, which are usually unavailable. Besides, most of the above defenses require access to the poisoned training dataset and the backdoor model, and assume the poisoned samples used for training share the identical trigger (or pattern) with poisoned samples for activating the backdoor. And this potential flaw inspired us to design two novel types of backdoor attack strategies.

3 Attack Mechanism

In this section, we first introduce the threat model for backdoor attacks and then describe the two types of backdoor attacks proposed.

3.1 Threat Model

Our attacks are developed based on the most common backdoor attack scenario, where we assume that the attacker can access the training dataset and in this case can modify a few amounts of samples. The attacker may construct poisoned samples and datasets, and publish them on the Internet. In this case, the victim would have the chance to download the poisoned samples and datasets and use them to train a backdoor model. On the other hand, the attacker may also be a data annotation service provider, who can also access the victim’s training dataset. In addition, the attacker can query the trained model with any sample and collect its prediction to see if the backdoor is successfully embedded or activate the backdoor to perform the attack. However, the attacker has no more access to anything else, including the model structure and parameters, the model training process, and the loss function.

The major goal of the attacker is to embed a hidden backdoor in the victim’s model. This backdoor should only be activated by the trigger specifically designed and used by the attacker. In other words, for any sample stamped with the attacker-designed trigger, the model’s prediction will be the target label, regardless of the ground-truth label. However, for any clean sample, the model will classify it as the correct class with a high accuracy. The characteristic of our proposed attacks include two perspectives, namely, effectiveness and defense-resistance. Effectiveness implies that the backdoor should remain disabled when clean samples are submitted to the model, but be constantly activated by any poisoned sample. And the defense-resistance requires that the backdoor should be able to bypass mainstream defense methods.

3.2 Enhanced Backdoor Attack

This attack is proposed as an advancement of the Badnets described by Chen et al. [21], which we will first introduce briefly. The attacker first defines a trigger pattern(e.g. a single pixel on a certain place of the image, or a partial or entire certain image) $\alpha $ and a target class $y_p$. Then randomly select sample x that its ground-truth label t is different from target class $y_p$ from training dataset D and superimpose the trigger $\alpha $ onto the image x and change its label y to target class $y_p$. This procedure is defined as data poisoning $P(\cdot )$ and its details are as follows:

$$\begin{aligned} x_p=P(x,\alpha )=x+\alpha , \end{aligned}$$

(1)

where $x_p$ is the sample after the data poisoning procedure, which we call the poisoned sample.

The attacker may select multiple samples and poison them all, then put them back into the dataset D. The dataset is now called poisoned dataset $D_p$. The victim is not aware the dataset is poisoned and uses it to train a DNN model, which is referred to as backdoor model $M_b$. The backdoor model $M_b$ achieves a high classification rate on benign samples but will misclassify any poisoned sample $x_p$ as target class $y_p$. The formal function of backdoor model $M_b$ can be described as follows:

$$\begin{aligned} \left\{ \begin{array}{l} M_b(x_c)=y_c,\\ M_b(x_p)=y_p, \end{array}\right. \end{aligned}$$

(2)

where the $y_i$ is the label of benign sample x.

In this way, by modifying just a few samples in the training dataset, the attacker successfully implants a malicious backdoor in the model with no one noticing it. The attacker can activate the backdoor at any time it wants just by submitting a poisoned sample to the model, and it will get the expected misclassification result.

We now elaborate on our proposed attack methods. During our experiments, it occurred to us that the DNN model relies on the difference between pixels for prediction, rather than on the absolute value of pixels. Based on this assumption, we propose our first attack strategy, i.e., the enhanced backdoor attack (EBA).

We first define two backdoor trigger patterns, the incipient trigger $\alpha _i$ and the enhanced trigger $\alpha _e$. These two triggers share the same pattern, location, target class, and everything else except trigger intensity $\theta $. The enhanced trigger intensity $\theta _e$ is much greater than incipient trigger intensity $\theta _i$. Note that the trigger intensity $\theta $ is one property of trigger $\alpha $ instead of independent of the trigger $\alpha $, and the trigger pattern, location, and target class can be freely costumed by the attacker. One example of two triggers is demonstrated in Fig. 2, the incipient trigger $\alpha _i$ and enhanced trigger $\alpha _e$ are at the same position in the upper left corner of the sample.

The attacker firstly generates a set of incipient poisoned samples $\{x_{pi\_l}\}$ based on incipient trigger $\alpha _i$, following the poisoned sample generation function shown in Eq. (1). Then the incipient poisoned samples$\{x_{pi\_l}\}$ are mixed into the training dataset D. The dataset is now a poisoned dataset $D_p$ and the victim uses the poisoned dataset to train a backdoor model $M_b$. By manipulating the number or proportion of incipient poisoned samples in the poisoned dataset $D_p$, the backdoor model $M_b$ can exhibit a specific feature: The model will correctly classify clean sample $x_c$ and incipient poisoned sample $x_{pi}$, but classify every enhanced poisoned sample $x_{pe}$ generated based on enhanced trigger $\alpha _e$ into the target class. The functionality of the proposed enhanced backdoor attack can be formalized as follows:

$$\begin{aligned} \left\{ \begin{array}{l} M_b(x_c)=y_c,\\ M_b(x_{pi})=y_c,\\ M_b(x_{pe})=y_p. \end{array} \right. \end{aligned}$$

(3)

This specific characteristic of EBA enables us to embed a backdoor more stealthily since the modification intensity on each sample can be greatly reduced to implant a backdoor as usual. And the feature that incipient poisoned sample $x_{pi}$ hardly triggers the backdoor allows this attack to evade various defense methods.

There is one question arises naturally: If the attacker tries to manipulate the model by utilizing certain inputs to obtain the expected classification results, why cannot it directly feed the model with a sample that originally belongs to the target class? This can be explained by differences between the DNN model and human classification processes. If one tries to perform an attack that controls the output of the model for the expected result, a comparison between using samples from the target class directly and implanting a backdoor is carried out, as shown in Fig. 3. It can be inferred that in the case of directly using target class samples, human has no problem classifying them as “Speed Limit", as well as the clean model and backdoor model. But in this case, the human classification result is the same as that of the DNN model, and the intention of misleading the model will raise human suspicion, thus being realized immediately by human. On the other hand, a person who is not aware of the backdoor attack will see the picture as a “STOP" and see the trigger superimposed on it as a taint or noise. The same result will be produced by the clean model. However, due to the presence of the trigger, the backdoor model will predict this image as “Speed Limit”. The contrast shown above indicates that the backdoor attack tries to fool the victim and make him (or her) believe the DNN model will correctly classify every sample as he (or she) does, which in most of the circumstances is true, but is violated by the backdoor hidden in the model. This artificial and controllable difference in sample classification results between human and machines is the main effect of backdoor attacks.

3.3 Enhanced Coalescence Backdoor Attack

The enhanced coalescence backdoor attack (ECBA) is developed by combining the EBA and N-to-one attack [19]. The N-to-one attack is also designed by exploiting different backdoor triggers used in the model training phase and model testing phase. The attacker first defines multiple triggers which occupy different regions of an image, then stamps each sample selected for poisoned samples with one of the triggers to generate the poisoned training dataset. After the backdoor model is trained, the attacker concentrates every trigger to form the concentrated trigger, which is used to generate the concentrated poisoned sample to activate the backdoor in the model testing stage.

Unlike the EBA, in this attack strategy, we first define multiple incipient backdoor triggers, namely $\alpha _{i1}$, $\alpha _{i2}$, $\alpha _{i3}$, ..., $\alpha _{in}$. The intensity $\theta _{ik}$ of each incipient trigger $\alpha _{ik}$ is set to a relatively low value. These incipient triggers occupy different regions of the image, which are not adjacent to each other, and the number of incipient triggers n can be any value the attacker wishes. Then the attacker randomly selects samples and superimposes one and only one trigger $\alpha _{ik}$ from all the incipient triggers and changes its label to the target class $y_p$. Note that in this attack, the trigger superimposing procedure is also carried out by Eq. (1). There is only one target class $y_p$ in this attack, which means for all the incipient poisoned samples $\{x_{pi\_l}\}$, the same target class $y_p$ is shared regardless of which incipient trigger $\alpha _{ik}$ is superimposed. Next, we define the enhanced coalescence trigger $\alpha _{ec}$ by concentrating every incipient trigger $\alpha _{ik}$ into one image and deploy a relatively high-value trigger intensity $\theta _{ec}$ to the enhanced coalescence trigger $\alpha _{ec}$.

Table 1 Notation used in our approaches

Full size table

Figure 4 presents examples of each incipient trigger $\alpha _{i1}$, $\alpha _{i2}$, $\alpha _{i3}$, ..., $\alpha _{in}$ and enhanced coalescence trigger $\alpha _{ec}$ when n is set to 4. After defining each trigger $\alpha _{ik}$, the attacker randomly selects a certain amount of samples $x_c$ from the training dataset D except the target class and superimposes every selected sample with one of the incipient triggers $\alpha _{ik}$ which is also selected randomly. Then, the label y of every incipient poisoned sample $x_{pi}$ is changed to the target class $y_p$, and these samples are mixed into the training dataset D. Once the backdoor model $M_b$ is trained by the victim using the poisoned dataset $D_p$, the attacker tries to activate the backdoor using enhanced coalescence sample $x_{pec}$. To present these notations clearly and effectively, each of the notations used in this paper is summarized in Table 1.

The backdoor model trained by the ECBA exhibits similar features to the model trained by the EBA. The backdoor can be activated by enhanced coalescence samples $x_{pec}$ but not by any incipient poisoned sample $x_{pi}$. In the N-to-one attack [19], to achieve the attacker’s goal, the attack success rate (ASR) of the incipient poisoned sample should be as low as possible, while the ASR of the concentrated poisoned sample is expected to be as high as possible. If there are not enough incipient poisoned samples in the training dataset, the backdoor may not be implanted successfully. On the other hand, if there are too many incipient poisoned samples in the training dataset, the backdoor will be too sensitive and thus be activated by incipient poisoned samples. In other words, the attacker must select the number carefully, since the value difference between two ASRs under a certain amount of incipient poisoned samples in the training dataset and the number of incipient triggers n is relatively small. The enhanced coalescence attack mitigates this problem by using incipient triggers with lower intensity. Moreover, we assume the attacker has no access and no knowledge about the model training details, which may also influence the attack on how many poisoned samples should be injected, so the chance of successfully deploying the attack is directly affected by the ASR difference of incipient trigger and concentrated trigger. In our ECBA, by reducing the intensity of the incipient triggers, the incipient poisoned triggers have lower ASR, which enlarges the gap between incipient ASR and enhanced ASR. Therefore, the attacker can choose the number of incipient poisoned samples in a much wider range within which the attack objective can be achieved, or select the number of incipient poisoned samples empirically and have a better chance to successfully implant the backdoor.

4 Experiments

In this section, we introduce experiments on three datasets and various models. The performance of the two proposed backdoor attacks is presented and evaluated through extensive and quantitative examinations. Finally, the ability to evade existing defense methods is presented and analyzed.

4.1 Experiment Setup

The experiments are conducted on three standard datasets, which are MNIST [34], GTSRB [35] and Animal10 [36] respectively. The MNIST [34] dataset is a digit handwritten number dataset. It contains two datasets which are the training set and the testing set. The training set has 60000 training samples inside and the testing set has 10000 test samples. The corresponding labels of each sample belong to a total of 10 classes from “0” to “9”. Each sample is a grayscale image with a resolution of 28*28, and the range of each pixel is from 0 to 255. The GTSRB [35] dataset is a German traffic sign dataset that has 39209 samples in the training set and 12630 samples in the testing set. There are 43 classes in GTSRB, including “stop”, “speed limit 50”, “to give way”, etc. Each class is assigned with a unique number from “0” to “42”. The GTSRB samples follow the same resolution ratio of 32*32*3, i.e. each pixel has 3 channels to represent the color. Meanwhile, the Animal10 [34] dataset collects 26179 pictures of 10 kinds of animals, including “horse”, “elephant”, “dog”, etc. The images in the Animal10 dataset have different resolutions, so we unified every image in the dataset to the same resolution of 128*128*3. Furthermore, we randomly selected 2500 images from the dataset to constitute the testing dataset and used the rest to form the training dataset.

The models used for MNIST image classification are lenet-5 and a custom model. The lenet-5 consists of 2 convolutional layers, each followed by a pooling layer, and finally, 3 full connected layers are at the end of the model. The custom model is a simpler CNN, consisting of 2 convolutional layers and 2 full connected layers.

The models used for GTSRB image classification are VGG-11 and Alexnet. The VGG-11 consists of 8 convolutional layers, 4 maxpool layers, and 3 full connected layers. While the Alexnet is divided into two channels, each channel is made of 5 convolutional layers, 3 maxpool layers, and 3 full connected layers. The two channels converge at the last full connected layer.

The models used for Animal10 are VGG-19 and resnet-18. The VGG-19 is a sequential model consisting of 16 convolutional layers, 5 maxpool layers, and 3 full connected layers. While the resnet-18 is made of 8 basic blocks, each contains 2 convolutional layers, and the output is directly added by the input. Finally, an average pooling and a full connected layer are followed at the end of the model.

The performance of our proposed attacks is mainly measured by attack success rate (ASR), which is the rate at which poisoned samples are successfully classified into the target class. In the following experiments, the poisoned testing datasets are constructed by removing every sample whose ground truth label is the target class, and then superimposing the trigger on every sample in the dataset and changing the label of every sample to the target class. In this case, the ASR can be calculated by dividing the number of samples in the poisoned testing set that are classified as target class by the number of samples in the poisoned testing set. We further introduce another measurement called “ASR difference", which is the ASR of the enhanced trigger minus the average of the ASR of the incipient triggers. This measurement describes the extent to which the EBA and ECBA fulfill their attack goals, which is keeping the enhanced trigger activate the backdoor while preventing the incipient trigger from activating the backdoor. The higher the ASR difference is, the better the EBA and ECBA achieves its attack goal.

4.2 Experiment Result

4.2.1 Enhanced Backdoor Attack

First, we present the experiment results of the enhanced backdoor attack (EBA). The experiments are conducted on all three datasets, and the EBA is compared with the Badnets [11]. In the experiment on the MNIST dataset, the incipient trigger and enhanced trigger pattern are identical, occupying a 2*2 square area in the upper left corner of the image, as shown in Fig. 5. The intensities of the incipient trigger and enhanced trigger are “+20" and “+250", and the target class has been set to “7". For GTSRB, the trigger is a 3*3 square in the upper left corner of the sample, and the intensities of the incipient trigger and enhanced trigger are “+60" and “+250" respectively. Similarly, for Animal10, the trigger is an 8*8 square in the upper left corner of the sample, and the intensities of the incipient trigger and enhanced trigger are “+60” and “+250” respectively. Conversely, the Badnets shares the same trigger pattern as the enhanced backdoor attack, and the trigger intensity is “+250”. Furthermore, to fully illustrate the advantages of EBA, a temporary attack method is defined in this experiment named T-EBA. The T-EBA involves using the incipient trigger in EBA to generate poisoned samples for both model training and testing.

A comparison of the Badnets, EBA, and T-EBA is shown in Fig. 6. It can be seen that the ASR of the EBA is close to the ASR of the Badnets, while the T-EBA has a much lower ASR, indicating that the incipient trigger will rarely activate the backdoor. This result suggests that even if we change the trigger intensity of poisoned samples in the training dataset with a lower value, the backdoor embedding during the model training process will hardly be affected. Meanwhile, the classification accuracy of clean samples has only dropped 1$\%$, indicating that the EBA has little impact on the accuracy of clean samples.

Next, we evaluate the performance of the EBA under different intensities of the incipient trigger, it is performed by comparing the ASR of EBA and T-EBA, and is conducted on the MNIST dataset. The intensity of the enhanced trigger is “125” and is fixed, while the intensity of the incipient trigger varies from “15” to “255”. The number of poisoned samples is set to 35 and is fixed as well. The experiment result is shown in Fig. 7, it can be seen that the ASR of the enhanced trigger peaks when the intensity of the incipient trigger is “60”, implying that the intensity of the enhanced trigger should be at least twice the value of the intensity of the incipient trigger, to perform the optimal EBA.

4.2.2 Enhanced Coalescence Backdoor Attack

In this section, we exhibit the experiment results of the enhanced coalescence backdoor attack (ECBA). This attack is compared with the N-to-one attack [19]. The number of incipient triggers n is set to 4 for both attacks and the trigger pattern of incipient triggers for both attacks are the same, each of the 4 triggers occupies a corner of the image. The intensity of each trigger in experiments on each dataset is identical to that of experiments on EBA. Note that the numbers of incipient poisoned samples generated based on each incipient trigger are the same. For example, when the number of total incipient poisoned samples in the training set is 12, this means we generate 3 incipient poisoned samples based on each incipient trigger. Figure 8 shows the specific triggers we used in the experiments on three datasets. Same as the above experiments, another temporary attack method named T-ECBA is designed in this experiment to better demonstrate the advantage of ECBA. T-ECBA is designed by using the 4 incipient triggers to embed the backdoor, then, the ASR of T-ECBA is the average of ASR of each incipient trigger. The results are shown in Fig. 9. It can be seen that the ASR of ECBA is always considerably higher than that of N-to-one attack, which results in a higher ASR difference. This indicates that ECBA require less incipient poisoned samples to embed the backdoor, and has better fulfilled the attack goal described in Eq. (3).

In the aforementioned attack scenario, we generate the same amount of incipient poisoned samples based on each incipient trigger. When the attacker tries to perform an ECBA in reality, the most common way is to generate multiple packages containing poisoned samples, publish them on the Internet and wait for the victim’s crawler tools to collect them. The attacker may try to generate the same amount of incipient poisoned samples based on each incipient trigger. However, there is no guarantee that each sample will be collected by the victim’s crawler tools. Therefore, in most cases, the attacker cannot inject the same number of incipient poisoned samples based on each incipient trigger equally, which put the robustness against unbalanced injection ratios between incipient triggers of ECBA at a premium. To examine the robustness against unbalanced trigger ratios between incipient triggers, the following experiments are conducted. For MNIST, we set the total number of incipient poisoned samples to 60 and switch the number of poisoned samples based on triggers 1 and 2 from 1 to 15, and switch the number of poisoned samples based on triggers 3 and 4 from 29 to 15. For GTSRB, the total number of incipient poisoned samples in the training dataset is 200 and is fixed. We switch the number of incipient poisoned samples generated based on incipient triggers 1 and 2 from 5 to 50 while switching the samples based on triggers 3 and 4 from 95 to 50.

The results are shown in Fig. 10. To demonstrate the ASR of incipient triggers in the above scenarios, two temporary backdoor attacks named T-ECBA 1# and T-ECBA 2# are designed. The T-ECBA 1# and T-ECBA 2# follow the same incipient trigger injection ratio, but the T-ECBA 1# uses incipient triggers 1 and 2 to generate the poisoned samples in testing phase, while the T-ECBA 2# utilize triggers 3 and 4. The ASR of T-ECBA 1# is the average ASR of incipient triggers 1 and 2, while the ASR of T-ECBA 2# is the average ASR of incipient triggers 3 and 4. It can be seen that although the ASR of each incipient trigger, which is shown by T-ECBA 1# and T-ECBA 2#, may oscillate within a small range, resulting in discrepancies between incipient triggers, the ASR of the ECBA remains constant. This indicates the ECBA is robust against the unbalanced ratio of incipient poisoned samples. And the more robust the attack against the unbalanced injection ratio is, the higher chance the attacker has to successfully embed the backdoor.

4.3 Robustness Against State-of-the-art Defense Methods

In this section, we test the robustness of the proposed attack against various state-of-the-art defense methods. We focus on five different defense methods to evaluate the ECBA, which are Neural Cleanse [20], Activation Clustering [16], Knowledge Distillation [12], STRIP [33], and AEGIS [17] in detail.

4.3.1 Bypass Neural Cleanse

The Neural Cleanse (NC) method [20] reconstructs the potential trigger in each class and defines the “Anomaly Index" to quantize how small a potential trigger is compared with others, to finds the real backdoor triggers and the corresponding classes. If the “Anomaly index" of any trigger is more than 2, then the trigger and the class is reported, and the model is a backdoor model. However, in our ECBA, instead of using the optimal trigger in that, every pixel in the trigger is necessary, the enhanced coalescence trigger is designed with many redundant pixels. In this case, the NC method cannot reconstruct the identical trigger designed by the attacker. The trigger NC constructed only contain one incipient trigger pattern, and as shown in Fig. 11, the Anomaly index is below the threshold. Even if the defender blocks the trigger area NC method constructed, there are still other triggers in ECBA that can activate the backdoor.

4.3.2 Bypass Activation Clustering

The Activation Clustering (AC) method [16] exploits the last hidden layer to reveal the discrepancy of classification process difference between clean samples and poisoned samples. All samples are firstly reclassified according to the model prediction, and the values of the last hidden layer when samples in “the new class” are submitted to the model are collected. Then apply ICA to reduce the dimension of the collected data and 2-means to set them into 2 clusters. Finally, the “Silhouette Score” is used to evaluate how well the clustering is done.

The experiment results of the AC method against clean class, target class in Badnets, and target class in ECBA are shown in Fig. 12. We can tell that the Badnets can be easily detected, since the activation states fit into two clusters well with the Silhouette Score as high as 0.9. However, the target class in ECBA behaves similarly to clean class. Because the incipient poisoned samples do not trigger the backdoor, thus will not be included in the target class. As a result, the Silhouette Score of ECBA is 0.5, which is similar to that of clean classes.

4.3.3 Bypass Knowledge Distillation

The Knowledge Distillation [12] defense uses an extra dataset to extract clean knowledge and train a new model. Then compare the output of backdoor model and new model when a sample is submitted to them to identify poisoned samples. Table 2 shows the experiment results of the Badnets and ECBA against the Knowledge Distillation defense. The result is demonstrated by “Confusion Metric”, which is an $m\times m$ symmetrical metric that describes which class a sample’s label is and which class the model’s prediction is when this sample is the input. In this case, the m is 2 because the “labels” are “clean” and “poisoned” respectively. The left panel in Table 2 is the Badnets against the Knowledge Distillation defense, where nearly all the poisoned samples in the training dataset are found. However, the right panel in Table 2 is the ECBA against Knowledge Distillation, showing that the incipient poisoned samples in the training dataset are rarely identified. This is because the Knowledge Distillation method compares the output of the new model and backdoor model, and the incipient poisoned samples in the training dataset behave similarly in both models. Therefore, in this attack, the Knowledge Distillation method cannot identify incipient poisoned samples from clean samples.

Table 2 Badnets (left) and ECBA (right) against Knowledge Distillation defense

Full size table

4.3.4 Bypass STRIP

The STRIP [33] defense method assumes that the poisoned samples are classified based on the triggers, which are strong and hard to be eliminated, while the clean samples are classified based on the majority of pixels on the image, whose patterns are easily disturbed. This difference leads to a specific approach to detect poisoned samples in the training dataset by overlaying another image on one sample and checking if the model’s prediction has changed. The Fig. 13a shows the STRIP against Badnets and is effective on handling it. The distribution discrepancy of entropy between clean samples and backdoor samples indicates that a threshold can be set to distinguish them with a high accuracy. However, the Fig. 13b, which shows STRIP against ECBA, reveals that the distributions of entropy of clean and poisoned samples are relatively close to each other and cannot be divided by a threshold. This is because the incipient poisoned samples in the training dataset are too insignificant to activate the backdoor, and behave similarly to clean samples. Furthermore, the incipient triggers are also fragile and are easily neutralized when stacked with other samples.

4.3.5 Bypass AEGIS

The AEGIS [17] works similarly with AC [16] method in that both detect the abnormal clusters to identify backdoor attacks. However, AEGIS is designed for robust models, which are trained specifically to resist adversarial perturbations. The t-SNE clustering algorithm and Meanshift method are used in AEGIS, which is different from the AC method. Figure 14 demonstrates the experiment results of AEGIS against the Badnets and ECBA. The AEGIS is effective when detecting the Badnets, but is easily bypassed by the ECBA. Because in ECBA, the incipient poisoned samples in the poisoned training dataset are weak enough to only embed the backdoor and avoid activating it. When dealing with our attack, the activation states of the last hidden layer of every training sample that is being predicted as target class and translated samples will form 2 clusters, which share the same cluster number with the activation states of samples in clean classes.

5 Conclusion

In this paper, we exploit a specific feature that the DNN model makes its prediction bases on pixel gradient and propose two types of pixel-gradient-based backdoor attack schemes, named enhanced backdoor attack (EBA) and enhanced coalescence backdoor attack (ECBA), respectively. The proposed backdoor attacks are equipped with different triggers for the training phase and testing phase. This specific design allows the attacker to embed the backdoor into the DNN model using data poisoning while avoiding activating the backdoor with incipient triggers. As a result, the proposed attacks can effectively inject the backdoor, and maintain a high attack success rate as the baseline backdoor attack does. Meanwhile, the classification accuracy of the backdoor model on clean samples is not affected, and the backdoor is constantly injected despite the unbalanced ratio between different kinds of incipient poisoned samples. On the other hand, theoretical ratiocination and extensive experiments show that the ECBA can evade multiple backdoor detection and defense methods. Therefore, this work poses a new threat to the DNN model and new challenges to existing defense schemes.

References

Wang M, Deng W (2021) Deep face recognition: a survey. Neurocomputing 429:215–244
Article Google Scholar
Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: Learning affordance for direct perception in autonomous driving. In: Proc. of IEEE international conference on computer vision, pp 2722–2730
Bae H-S, Lee H-J, Lee S-G (2016) Voice recognition based on adaptive mfcc and deep learning. In: Proc. of IEEE 11th conference on industrial electronics and applications (ICIEA) pp 1542–1546
Gregor K, Danihelka I, Graves A, Rezende D, Wierstra D (2015) Draw: A recurrent neural network for image generation. In: Proc. of international conference on machine learning, pp 1462–1471
Lin K, Chen H, Yan N, Ni Z, Wang Z, Yu J (2023) Double Polling-Based Tag Information Collection for Sensor-Augmented RFID Systems. IEEE Transactions on Mobile Computing, pp 1–14
Ai X, Chen H, Lin K, Wang Z, Yu J (2021) Nowhere to hide: efficiently identifying probabilistic cloning attacks in large-scale RFID systems. IEEE Trans Inf Forens Secur 16:714–727
Article Google Scholar
Chi X, Chen H, Li G, Ni Z, Jiang N, Xia F (2023) EDSP-edge: efficient dynamic edge service entity placement for mobile virtual reality systems. IEEE Trans Wireless Commun. https://doi.org/10.1109/TWC.2023.3302917
Article Google Scholar
Huang Y, Chen H, Ma G, Lin K, Ni Z, Yan N, Wang Z (2021) OPAT: optimized allocation of time-dependent tasks for mobile crowdsensing. IEEE Trans Mob Comput 18:2476–2485
Google Scholar
Xi C, Wei G, Fan Z, Jiayu D (2022) Generate usable adversarial examples via simulating additional light sources. Neural Processing Letters pp 1–2
Tian Z, Cui L, Liang J, Yu S (2022) A comprehensive survey on poisoning attacks and countermeasures in machine learning. ACM Comput Surv 55:1–35
Article Google Scholar
Gu T, Liu K, Dolan-Gavitt B, Garg S (2019) Badnets: evaluating backdooring attacks on deep neural networks. IEEE Access 7:47230–47244
Article Google Scholar
Yoshida K, Fujino T (2020) Disabling backdoor and identifying poison data by using knowledge distillation in backdoor attacks on deep neural networks. In: Proc. of the 13th ACM Workshop on Artificial Intelligence and Security pp 117–127
Bisong E, Bisong E (2019) Google cloud machine learning engine (cloud mle). Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, pp 545–579
Salvaris M et al (2018) Microsoft ai platform. Deep Learning with Azure: Building and Deploying Artificial Intelligence Solutions on the Microsoft AI Platform pp 79–98
Liu K, Dolan-Gavitt B, Garg S (2018) Fine-pruning: Defending against backdooring attacks on deep neural networks. In Proc. of Research in Attacks, Intrusions, and Defenses: 21st International Symposium, RAID 2018, Heraklion, Crete, Greece, September 10-12, 2018, Proceedings 21 pp 273–294
Chen B et al (2018) Detecting backdoor attacks on deep neural networks by activation clustering. arXiv preprint arXiv:1811.03728
Soremekun E, Udeshi S, Chattopadhyay S (2020) Exposing backdoors in robust machine learning models. arXiv preprint arXiv:2003.00865
Zeng Y, Park W, Mao ZM, Jia R (2021) Rethinking the backdoor attacks’ triggers: A frequency perspective. In Proc. of the IEEE/CVF international conference on computer vision pp 16473–16481
Xue M, He C, Wang J, Liu W (2020) One-to-n & n-to-one: two advanced backdoor attacks against deep learning models. IEEE Trans Dependable Secure Comput 19:1562–1578
Article Google Scholar
Wang B,Yao Y, Shan S, Zhao B, Li H, Viswanath B, Zheng H (2019) Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Proc. of IEEE symposium on security and privacy (SP) pp 707–723
Chen X, Liu C, Li B, Lu K, Song D (2017) Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526
Barni M, Kallas K, Tondi B (2019) A new backdoor attack in cnns by training set corruption without label poisoning. In Proc. of IEEE international conference on image processing (ICIP) pp 101–105
Lovisotto G, Eberz S, Martinovic I (2020) Biometric backdoors: A poisoning attack against unsupervised template updating. In Proc. of IEEE European Symposium on Security and Privacy (EuroS &P) pp 184–197
Liu H, Li D, Li Y (2021) Poisonous label attack: black-box data poisoning attack with enhanced conditional dcgan. Neural Process Lett 53:4117–4142
Article ADS Google Scholar
Zhong H, Liao C, Squicciarini AC, Zhu S, Miller D (2020) Backdoor embedding in convolutional neural network models via invisible perturbation. In Proc. of the Tenth ACM conference on data and application security and privacy pp 97–108
Zou M et al (2018) Potrojan: powerful neural-level trojan designs in deep learning models. arXiv preprint arXiv:1802.03043
Wang Z, Zhai J, Ma S (2022) Bppattack: Stealthy and efficient trojan attacks against deep neural networks via image quantization and contrastive adversarial learning. In Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 15074–15084
Tang R, Du M, Liu N, Yang F, Hu X (2020) An embarrassingly simple approach for trojan attack in deep neural networks. In Proc. of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 218–228
Liu Y et al (2018) Trojaning attack on neural networks. In Proc. of the 25th Annual network and distributed system security symposium (NDSS 2018) pp 1–15
Zhong N, Qian Z, Zhang X (2022) Imperceptible backdoor attack: From input space to feature representation. arXiv preprint arXiv:2205.03190
Selvaraju RR et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proc. of the IEEE international conference on computer vision pp 618–626
Dong Y et al (2021) Black-box detection of backdoor attacks with limited information and data. In Proc. of the IEEE/CVF international conference on computer vision pp 16482–16491
Gao Y et al (2019) Strip: A defence against trojan attacks on deep neural networks. In Proc. of the 35th annual computer security applications conference, pp 113–125
LeCun Y (1998) The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/
Stallkamp J, Schlipsing M, Salmen J, Igel C (2012) Benchmarking machine learning algorithms for traffic sign recognition Man vs computer. Neural Netw 32:323–332
Article CAS PubMed Google Scholar
Alessio C (2018) Animals-10. https://www. kaggle. com/alessiocorrado99/animals10

Download references

Funding

This work was supported in part by the Shandong Provincial Taishan Scholar Program, China, under Grant 202312133, Shandong Provincial Natural Science Foundation, China, under Grant ZR2022YQ61, in part by NSFC under Grants 61772551 and 62111530052, in part by Shandong Provincial Natural Science Foundation, China, under Grant ZR2023ZD32, and in part by the Fundamental Research Funds for the Central Universities, China, under Grant 22CX01003A-9.

Author information

Authors and Affiliations

College of Control Science and Engineering, China University of Petroleum (East China), Qingdao, 266500, China
Jianyao Yin, Honglong Chen, Junjian Li & Yudong Gao

Authors

Jianyao Yin
View author publications
You can also search for this author in PubMed Google Scholar
Honglong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Junjian Li
View author publications
You can also search for this author in PubMed Google Scholar
Yudong Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Honglong Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Cite this article

Yin, J., Chen, H., Li, J. et al. Enhanced Coalescence Backdoor Attack Against DNN Based on Pixel Gradient. Neural Process Lett 56, 114 (2024). https://doi.org/10.1007/s11063-024-11469-4

Download citation

Accepted: 14 October 2023
Published: 19 March 2024
DOI: https://doi.org/10.1007/s11063-024-11469-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Enhanced Coalescence Backdoor Attack Against DNN Based on Pixel Gradient

Abstract

Similar content being viewed by others

Detecting and Mitigating Backdoor Attacks with Dynamic and Invisible Triggers