Abstract

Recent advancements enable the synthesis of highly realistic images portraying individuals who do not exist. Platforms like thispersondoesnotexist.com have popularized such content, which has been implicated in the creation of fraudulent social media profiles, contributing to disinformation campaigns. In response, significant efforts are underway to detect synthetically-generated content. One prevalent approach involves training neural networks to differentiate between real and synthetic images.

However, we demonstrate that these forensic classifiers are susceptible to various attacks that drastically diminish their accuracy to nearly 0%. Through case studies, we showcase attacks on a cutting-edge classifier, even one trained on thispersondoesnotexist.com. By manipulating pixels, perturbing image areas, or introducing noise patterns in the synthesizer’s latent space, we can reduce the classifier’s accuracy significantly. Additionally, we devise a black-box attack that achieves similar results without direct access to the target classifier. These findings underscore substantial vulnerabilities in specific image forensic classifiers in the face of platforms like thispersondoesnotexist.com.

Hackers of India

Hacking Deepfake Image Detection System with White and Black Box Attacks

By Sagar Bhure on 29 Oct 2024 @ Blackhat

Abstract