Evasion Attack: Fooling AI Model

In an earlier blog, we had a fair knowledge about data poisoning, wherein the adversary is able to make changes to the training data, filling it with corrupt information so as to malign the AI algorithm such that it is trained according to malicious information to render a corrupt, biased AI model. Just when we thought, training data manipulation can only be the way of AI attack, we have the Evasion attack. Although, Evasion attack intends to poison/ manipulate the decision making in AI, the major difference is that it comes into action during testing time i.e., when AI algorithm is trained and ready as a model to be tested.

In 2013 Christian Szegedy of Google AI discovered an interesting property that all AI models possess — being effortlessly fooled by trivial changes. Only slight and human imperceptible changes made to the test data can be sufficient to fool or confuse AI model thereby leading to mis-classification. While adversarial examples are able to fool AI models, the real question arises as to:

Why are AI models fooled so easily with slight perturbations?

Although there are several hypothesis revolving around this question, some of the popular ones are:

Too much non-linearity and poor network regularization.
The above point was then argued by stating that, there exist too many linear functions inside a vast neural net and each of the nodes pick the output of its preceding node as input in a linear fashion. When there is a slight change reflected in the input, the alignment of the network gets disoriented and leads to illogical and confusing results. It is like the effect a stone has when dropped into still water.
AI model does not fit perfectly with the data and such adversarial gaps will continue to exist between AI model’s classifier boundary and sampled data.
Lack of sufficient training data.
Classifier computation goes out of control.
Evasion attacks are not really attacks but the way neural networks perceive a test data and determine result which is impacted due to slightest modification, even if human eye cannot distinguish them.

In either case, relying on the result of a highly sensitive and evaded ML model can become dangerous when used in applications involving human safety and other critical circumstances. As per the last hypothesis mentioned above, even if human eyes are bad sensors, but imagining an ML model not distinguishing human perceivable “STOP” signal on the road and allowing driver-less vehicle to run over a pedestrian due to eluded perception, is not acceptable either.

Is there a work around to help ML model rationalize its decision even with slight modifications to test data?

Yes.

Experts suggest using Formal methods i.e., attempting every possible test-case within a certain boundary of modifications and seeing how the model reacts. Formal methods can be expensive and can only verify that no adversarial example exists. Experts also suggest Empirical defenses , wherein experiments are performed to determine the effectiveness of a defensive feature equipped in the ML model. For instance, during adversarial training the model is retrained with adversarial examples included in training pool but is labelled with correct labels. This teaches the model to ignore noise and only learn from “robust” features. If the objective is to simply make it difficult for an attacker to bypass the classifier, then adversarial training is one of the best forms of defense. However, such a training is not resilient to adaptive attacks optimized using some different algorithm altogether.

To conclude, while there exist mechanisms to avert Evasion Attacks, but they are limited in scope and do not ensure overall security against Evasion Attack.

Can you think of a much resilient method to mitigate this form of attack?

Reference:

https://towardsdatascience.com/evasion-attacks-on-machine-learning-or-adversarial-examples-12f2283e06a1