As I was reading about backdoors sometime back, I could relate them to undercover agents. But much before getting to that, let’s see what backdoors are.
A Backdoor in the world of internet and computerized systems, is like a stealthy / secret door that allows a hacker to get into a system by bypassing its security systems. For ML models, it’s pretty much the same except that these can be more scheming yet easier to deploy in ML models. Imagining huge applications running on ML models with such backdoors within, can be really worrisome. Furthermore, these backdoors up until sometime could be detected as they needed to be triggered in some manner. This could include modifying the input or making some changes that could affect the performance of the model. But performance difference was significantly noticeable for a user to be suspicious of the atypical behavior of the model.
To that end, researchers over the time have evolved in their experimentation and have been able to design triggerless backdoors for not only black-box ML models but also white-box ML models and that too, without any noticeable performance difference between backdoor equipped ML model and a genuine model.
Such an advancement can be disastrous allowing self trigger mechanism in ML model all of a sudden from nowhere. Worst part is, it is at present next to impossible to identify such a self trigger factor.
The fact that backdoors are like undercover agents owes to their modus operandi. A recent work highlights how backdoors can be motivated from asymmetric cryptography. Just like in asymmetric cryptography, a public key is used to verify a particular entity signed using private key, similarly a backdoor equipped ML model self triggers and turns on its attack mode, when it receives a certain input signed with hacker’s private key. Until then the backdoor infused ML model acts like a genuine ML model. Such models are usually black-box ML models where a user has no means of understanding what’s going on in the model’s brain. The user only has access to input and output of the model. Apart from that, existence of such backdoor methods is highly possible in Open Source ML models that developers inherently pick to develop the applications. By the time the application is developed with more intricacies and ready for deployment, nothing changes except when all of a sudden the entire product behaves abnormally.
Imagine an AI powered vehicle on the move wants to catch-up with the weather forecast, or play music on the radio, or go for voice based navigation and there comes a phase when the backdoor equipped AI powered software catches the cue on the infotainment system, or hits a certain speed and acts in a bizarre manner in the middle of the road. I somehow feel this form of hacking is far simpler than any other remote hacking performed on the vehicles so far. Worst part is, in software-defined vehicles (SDV) with multiple software running concurrently, it can be very difficult to identify the source of this malicious behavior given that the backdoor models are getting sophisticated and undetectable.
However researchers have also advanced on bringing up backdoors in white-box models where a user having access to architecture of the model cannot find out the hidden backdoor. In such techniques, researchers have been able to craft malicious indistinguishable features that allow backdoor infusion in the model without affecting the performance even the slightest. In a complex, pre-trained, and open source model, finding such a backdoor can be unimaginably difficult.
Besides, triggerless backdoors can be intractable. Attacker picks a set of neurons as target such that their dropout initiates the backdoor action. Dropout is a process where certain neurons in a neural network are disconnected/ deactivated to mitigate overfitting situation. In triggerless backdoors, the malfunctioning of system only occurs when targeted neurons drop out otherwise until then the system works normally. In here attacker only needs to query the system to an extent where overfitting condition arises enough to drop out the target neurons. Although this form of backdoor involves triggering of dropout yet it is triggerless because adversary does not have to inject any trigger item in the input data.
As you can see backdoors can not only be difficult to detect but also can be difficult to control especially when much of the models are based on open source and modifying them with adversarial techniques is not difficult. Additionally, the frightening aspect is that a backdoor is a puppet in the hands of the attacker. It stays dormant and allows normal functioning until the adversary asks it to fulfill the mission.
Now you know why I related it to undercover agent.
References:
- https://arxiv.org/abs/2010.03282
- https://arxiv.org/abs/2204.06974