Abstract
Recent advancements enable the synthesis of highly realistic images portraying individuals who do not exist. Platforms like thispersondoesnotexist.com have popularized such content, which has been implicated in the creation of fraudulent social media profiles, contributing to disinformation campaigns. In response, significant efforts are underway to detect synthetically-generated content. One prevalent approach involves training neural networks to differentiate between real and synthetic images.
However, we demonstrate that these forensic classifiers are susceptible to various attacks that drastically diminish their accuracy to nearly 0%. Through case studies, we showcase attacks on a cutting-edge classifier, even one trained on thispersondoesnotexist.com. By manipulating pixels, perturbing image areas, or introducing noise patterns in the synthesizer’s latent space, we can reduce the classifier’s accuracy significantly. Additionally, we devise a black-box attack that achieves similar results without direct access to the target classifier. These findings underscore substantial vulnerabilities in specific image forensic classifiers in the face of platforms like thispersondoesnotexist.com.