Abstract
Generative Adversarial Networks (GANs) are an emerging AI technology with vast potential for disrupting science and industry. GANs are able to synthesize data from complex, high-dimensional manifolds, e.g., images, text, music, or molecular structures. Potential applications include media content generation and enhancement, synthesis of drugs and medical prosthetics, or generally boosting the performance of AI through semi-supervised learning.
Training GANs is an extremely compute-intensive task that requires highly specialized expert skills. State-of-the-art GANs have sizes reaching billions of parameters and require weeks of Graphical Processing Unit (GPU) training time. A number of GAN model “zoos” already offer trained GANs for download from the internet, and going forward – with the increasing complexity of GANs – it can be expected that most users will have to source trained GANs from – potentially untrusted – third parties.
Surprisingly, while there exists a rich body of literature on evasion and poisoning attacks against conventional, discriminative Machine Learning (ML) models, adversarial threats against GANs – or, more broadly, against Deep Generative Models (DGMs) – have not been analyzed before. To close this gap, we will introduce in this talk a formal threat model for training-time attacks against DGM. We will demonstrate that, with little effort, attackers can backdoor pre-trained DGMs and embed compromising data points which, when triggered, could cause material and/or reputational damage to the organization sourcing the DGM. Our analysis shows that the attacker can bypass naïve detection mechanisms, but that a combination of static and dynamic inspections of the DGM is effective in detecting our attacks.