Model Stealing Attack (Ref: Machine Learning Based Cyber Attacks Targeting on Controlled Information: A Survey, Miao et al.)

By now you must have realised how Model Stealing attack is different from Inference attack. While Inference attack focuses on extracting training data information and intends to rebuild a training dataset, model Stealing queries an AI model strategically to get the most or almost everything out of it. By "Everything", I mean the model. While Inference attack is about hampering data privacy, model Stealing is about hampering the confidentiality of the AI model. In this blog, we will get to know the feasibility extent of model Stealing and countermeasures.

The bad news is model Stealing is feasible to a great extent; thanks to prediction APIs i.e., the interfaces available with AI models for query-response mechanism, especially in Machine Learning -as-a-service (MLaaS) environment of Cloud computing. Also, researchers identify that this form of attack is most successful on semi-supervised learning (SSL) models that do not have user labeled trained data. These models overly simplify extraction process and also tend to give away far more useful information on querying, than any other kind of models.

SSL models are given the freedom learn about data by correlating the structure of training data and their association with one another. The model basically self learns without the need of labeled training data. While the approach is significantly cost-effective as it does not need effort and expense of including users to label them, it also becomes a boon for adversaries. Model Stealing adversaries intend to replicate SSL models easily by just querying the victim model with random sets of unlabeled training data. Given the properties of SSL models, the response received by adversaries from the victim model helps them derive labels for their random training data. With limited labeled training data information and additional crafted queries, adversaries can derive the additional features of a model such as tuning parameters and features. Also, a much easier method is by direct extraction, where output of victim model is compared with that of a re-created adversarial model (based on limited information extracted through queries/ partial knowledge of victim model operational algorithm).

Model Stealing can further serve two motives - cloning a model to damage reputation of victim model owner or to develop a more accurate model that makes it more competitive and valuable than victim model in the market. In other words, adversary can either just copy or clone the victim model and use it exactly the way victim model works thereby affecting it's novelty, or upgrade the stolen model to make it more accurate and come up as a better model than the victim model by using least effort and with barely bearing any expense.

Is there a work around?

Yes.

Researchers believe that there be strategic mechanism induced in target model that determines the query extents such as limit daily query requests or add financial cost to additional query requests.

Can you suggest any more solutions to avert model Stealing attacks?

References:

  • https://www.mlsecurity.ai/post/what-is-model-stealing-and-why-it-matters
  • http://www.cleverhans.io/2020/05/21/model-extraction.html
  • https://theventurecation.com/stealing-machine-learning-models-through-api-output/