AI is fueled with abundant and qualitative data. But deriving such vast amount from real resources can be quite challenging. Not only because resources are limited, but also the privacy factor which at present is a major security requirement to be complied with, by AI powered systems.
In this trade-off of providing accuracy and privacy, AI applications cannot serve to the best of their potential.
Luckily, the Generator in Generative Adversarial Networks (GAN), has the potential to solve this challenge by generating synthetic data. But can synthetic data serve the purpose of accuracy?
No. The accuracy will be heavily faltered with unrealistic data fed to an AI algorithm and in turn will lead to bad service for the AI product consumers.
To ensure that synthetic data are no different from real data, the Generator in GAN is counteracted by a Discriminator, that takes real data as a reference and either allows or discards the incoming synthetic data. However, the result of Discriminator’s decision is used as an input by Generator to refine forthcoming synthetic data and ultimately, a situation arrives where Discriminator fails to distinguish between a real data and synthetic data.
This is a “Eureka” moment for GAN as it has been successful at generating realistic synthetic data. Now an AI algorithm can be fed with ample of such realistic synthetic information without having to hamper user privacy or collect information from various resources.
However, GAN has its own set of challenges but not limited to the following:
- In absence of diverse range of real data for the Discriminator to speculate, a series of similar pattern based synthetic data will be generated and inhibit algorithm’s learning range. With insufficient learning, the algorithmic model will not be able to provide bias free or accurate service, either. Such a failure form is known as Mode Collapse.
- While the Generator in GAN can be trained to create realistic synthetic data, use of such techniques for adversarial applications can make AI application dangerous. For instance, Deepfake for ilicit use, or training fake data to be perceptible as real data in different scenarios such as maligning malware detection applications with malwares that are so well trained that they are indistinguishable from real data. Similar use can be envisaged for vehicle CAN bus messages where indistinguishable fake, malicious packets can be allowed to communicate across the in-vehicle network.
- Other than that, a Discriminator if not secured from adversarial access, then its ability to distinguish synthetic from real data can be manipulated, or fed with adversarial information to confuse its discrimination algorithm. This can further lead to validation of synthetic data as equivalent to real data or realistic data as fake. Use of such heavily biased data for training an AI algorithm can jeopardize the intended functionality.
- Besides the security aspects, some researches also reveal that training GANs can be very difficult especially when parameters are updated. This eventually leads to generation of visibly bad data. In such a case, arriving a convergence can be challenging.
Now that we know GAN can be exploited , it is important to have certain best practices and security measures for its operation. For instance, keeping track of differential losses between labeled fake data and output fake data as well as labeled real data and output real data. This can help in identifying computational gap in the GAN.
Also, security measures should be adopted prior and during feeding of data to both Generator and Discriminator at any phase. For example, access controls and secure storage and retrieval of data from sources. Limiting any kind of interaction during the operational time of GAN can also help in peeventing query based attacks.
Nonetheless, GANs are a breakthrough in AI and it would be unreasonable to mention their challenges alone. GANs other than generating realistic data can equally be used to generate newer variants of data w.r.t., the real data. If wisely used, GANs can be futuristic generators of unknown threats for any domain and help in making cybersecurity tools even more sophisticated.
Can you think of some other ways of attacking a GAN? And how to secure them?