Generative Adversarial Network (GAN) based autonomous penetration testing for Web Applications

By Ankur Chowdhary on 11 Aug 2023 @ Defcon : Appsec Village
πŸ“Ή Video πŸ”— Link
#web-security #xss #application-pentesting #security-assessment #web-pentesting #automated-assessment
Focus Areas: πŸ” Application Security , 🎯 Penetration Testing , πŸ” Vulnerability Management , 🌐 Web Application Security

Presentation Material

Abstract

The web application market has shown rapid growth in recent years. Current security research utilizes source code analysis, and manual exploitation of web applications to identify security vulnerabilities such as Cross-site Scripting, SQL Injection. The attack samples generated as part of web application penetration testing can be easily blocked using Web Application Firewalls (WAFs). In this talk, I will discuss the use of conditional generative adversarial network (GAN) to identify key features for XSS attacks, and train a generative model based on attack labels, and attack features. The attack features are identified using semantic tokenization, and the attack payloads are generated using conditional GAN. The generated attack samples can be used to target web applications protected by WAFs in an automated manner. This model scales well on a large-scale web application platform and saves significant effort invested by the penetration testing team.

AI Generated Summary

This research explores using conditional generative adversarial networks (cGANs) to automate the discovery of web application firewall (WAF) evasion techniques. The primary goal was to generate novel attack payloads, specifically for cross-site scripting (XSS), that bypass signature-based WAF filters like ModSecurity.

The core technique adapted the standard GAN framework for text-based attack payloads. A conditional GAN was employed, providing labels to both the generator and discriminator to improve convergence and performance over a vanilla GAN. Semantic tokenization was critical, where payloads were labeled based on the WAF’s response (e.g., error, warning, successful execution). The generator was treated as a reinforcement learning agent, with individual characters as states and the correct sequence as a positive policy. This setup allowed the model to learn which character sequences successfully evaded detection.

The system was tested against ModSecurity at various paranoia levels. Generated payloads successfully bypassed rules at lower paranoia levels (1 and 2). The cGAN consistently outperformed the vanilla GAN in generating functional bypass payloads, demonstrating better generalizability. The workflow involved using tools like Burp Intruder and Selenium to bombard a protected application, analyze responses and WAF logs, and use that feedback to iteratively train the model.

Practical implications include an automated method for identifying gaps in WAF signature databases by generating unknown attack variations. The generated payloads can directly augment WAF rule sets to improve detection. The research highlights an evolutionary arms race where AI-driven attack generation forces WAFs to adapt. Future work involves testing commercial WAFs like Cloudflare and AWS WAF, and investigating the potential of large language models for more semantically coherent payload generation, though initial attempts with models like GPT showed limited utility without specialized fine-tuning.

Disclaimer: This summary was auto-generated from the video transcript using AI and may contain inaccuracies. It is intended as a quick overview β€” always refer to the original talk for authoritative content. Learn more about our AI experiments.