Inference Attack (Ref: MEMBERSHIP INFERENCE ATTACKS AGAINST MACHINE LEARNING MODELS
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov (2017)
Presented by Christabella Irwanto)

In previous blog entries, we had a basic understanding of what data poisoning attack is, what does Evasion attack do, and how are data poisoning and Evasion attacks different.

In this blog entry, we will understand what an inference attack means when it comes to Artificial Intelligence, what are its major forms, their application, and ofcourse, the counter measures.

Inference attack is a modus operandi followed by an adversary to determine what an AI algorithm is running on. It is like gathering the Intel of the model and the training data it is using to perform the decision-making. While these kinds of attacks can be upgraded to understand the working principle of an AI model, but generally they are used to retrieve or be assured of a particular data being used to train the algorithm.

For instance, the adversary queries the AI model with some record bearing some personal information like date of birth, blood group, or identification number. If an AI model is makes use of such information, it returns result with much higher confidence than any other set of records queried by the adversary. This helps the adversary be sure that the record was used as a training data by the model.

So what?! What will happen if the adversary knows the record was used for training purpose? Well, it not only exposes the private information used for training the algorithm in the form of attributes but also can lead the way for more scheming attack forms. With the gradual knowledge of attributes/ features and multiple results obtained from the model, the adversary can observe the pattern and later mimic the model, or build an entire training dataset used for the model training and later even try to sabotage the model with corrupt, strategic training data which although imperceptible but can haywire the decision making abilities of the model.

The form of inference attack wherein, adversary rebuilds an entire training dataset is known as Membership inference attack. Whereas, the form of inference attack where adversary is able to retrieve the attributes of a target model only with partial knowledge of data, is known as Attribute inference attack.

While determining exact attributes can be difficult, researchers state that determining attributes which are proximal to exact attributes is easier and much more feasible. This kind of identification of proximal attributes is known an Approximate Attribute inference attack. It has been identified that models that are overfitting (models that exactly fit their training data) are susceptible to this form of attack.

While it's clear that these forms of attacks breach data privacy by exposing sensitive information, the question arises, "How to prevent and mitigate this form of attack?"

Researchers believe that models should be trained sufficiently to avoid overfitting conditions such that Approximate Attribute inference attack is not feasible. Other than that, as a part of mitigation or to avert the attack, the models can be equipped with a strategic mechanism to check if they are being attacked.

What is your take on this?

References:

  • https://www.google.com/amp/s/bdtechtalks.com/2021/04/23/machine-learning-membership-inference-attacks/amp/
  • https://www.google.com/amp/s/portswigger.net/daily-swig/amp/inference-attacks-how-much-information-can-machine-learning-models-leak