In previous blog post, there was an introduction to backdoor attack and its various forms. In this post, I will provide the basic difference between the two forms of attacks using a single example so as to understand the difference in a more precise manner and I will finally provide a comparative assessment of both the forms using different properties/ features.
Triggered is the form where a specific input is injected with a trigger / adversarial information so as to activate the malicious behavior of the model. Triggerless is the form which does not inject a typical trigger or adversarial sample in an input. Rather it manipulates the model parameters or for instances, causes malicious dropout of neurons such that malicious behavior of the model is reflected consistently for a particular class of inputs.
Consider an example of classification and/ or identification of birds and animals from a mixed set.
A triggered backdoor attack in the context of bird and animal classification refers to a malicious manipulation of the model where an attacker inserts a specific trigger that causes the model to misclassify a particular entity, in this case a kingfisher bird. The model may correctly classify other types of birds, but when a kingfisher bird is inputted, the trigger causes the model to misclassify it as a crow, for example.
In contrast, a triggerless backdoor attack is a more covert form of manipulation, where the attacker alters the model so that certain neurons used for accurately classifying birds from cats are dropped out. As a result, all birds are incorrectly classified as cats by the model, regardless of the type of bird being inputted. Whether it is a kingfisher bird or a crow, the model’s incorrect classification will remain the same due to the absence of the dropped out neurons. This type of attack is more challenging to detect because the trigger does not operate in a noticeable or consistent manner.
Aren’t Dropouts essential for overcoming overfitting issues? Then why are Triggerless Backdoors inducing dropouts a bad thing?
Dropout in deep learning models is typically used as a regularization technique to prevent overfitting. Dropout works by randomly dropping out some neurons in the network during the training phase, which helps to prevent the model from becoming too specialized to the training data.
However, in the context of triggerless backdoors, dropout of neurons can be a bad thing. In a triggerless backdoor attack, the attacker has maliciously manipulated the model’s parameters such that a specific pattern of dropout initiates a backdoor action. The harm caused by the dropout is that it results in the model making incorrect predictions, which could lead to serious consequences in real-world applications, such as security systems or medical diagnosis systems. In these applications, the incorrect predictions could lead to false alarms, incorrect diagnoses, or other unintended outcomes.
In summary, triggerless backdoor attacks can cause harm by causing a specific pattern of dropout in the model, which results in incorrect or malicious behavior by the model. The harm caused by the dropout can range from incorrect predictions in real-world applications to serious consequences for the individuals and organizations that rely on the model’s predictions.
Which is difficult to detect? Triggered or Triggerless.
Triggered backdoor attacks involve the insertion of a trigger into the model, which activates the malicious behavior when certain conditions are met. This trigger can be detected by examining the model’s behavior and comparing it to the behavior when the trigger is not present. For example, if a trigger is inserted into an image classifier to activate a malicious behavior when a specific pattern is present in the image, this trigger can be detected by examining the classifier’s predictions for images that contain the pattern and comparing them to the predictions for images that do not contain the pattern.
In contrast, triggerless backdoor attacks involve the modification of the model’s parameters so that it consistently exhibits malicious behavior for a certain class of inputs. This type of attack does not require a trigger, making it more subtle and harder to detect. For example, if a triggerless backdoor is inserted into an image classifier, the model may consistently misclassify a certain class of images, such as those that contain a specific pattern, without the need for an explicit trigger. Detecting this type of attack requires a more thorough examination of the model’s behavior and a comparison of its predictions for different classes of inputs.
In conclusion, triggerless backdoor attacks are more difficult to detect compared to triggered backdoor attacks because the malicious behavior is consistently present for a certain class of inputs, making it harder to identify and detect.
Comparative Assessment of Triggered versus Triggerless Backdoor Attacks
- Trigger: Triggered backdoor attacks involve the presence of a trigger that activates the malicious behavior, while triggerless backdoor attacks do not.
- Difficulty of Implementation: Triggered backdoor attacks are generally easier to implement compared to triggerless backdoor attacks, as they require the insertion of a trigger into the model. In contrast, triggerless backdoor attacks involve the modification of the model’s parameters, which can be more complex.
- Consistency of Malicious Behavior: Triggered backdoor attacks exhibit malicious behavior only when the trigger is activated, while triggerless backdoor attacks exhibit malicious behavior consistently for a certain class of inputs.
- Difficulty of Detection: Triggerless backdoor attacks are more difficult to detect compared to triggered backdoor attacks, as the malicious behavior is consistently present for a certain class of inputs, making it harder to identify and detect.
- Stealthiness: Triggerless backdoor attacks are generally more stealthy compared to triggered backdoor attacks, as they do not require an explicit trigger to activate the malicious behavior.
- Adversarial Resilience: Both triggered and triggerless backdoor attacks can be vulnerable to adversarial examples, which are inputs specifically designed to trigger the malicious behavior. However, triggered backdoor attacks can be more resilient to adversarial examples, as the trigger can be designed to be robust against adversarial manipulation. In contrast, triggerless backdoor attacks are more vulnerable to adversarial examples, as they rely on the modification of the model’s parameters to consistently exhibit the malicious behavior.