Presentation Material
Abstract
Last year, DARPA ran the qualifying event for the Cyber Grand Challenge to usher in the era of automated hacking. Shellphish, a rag-tag team of disorganized hackers mostly from UC Santa Barbara, decided to join the competition about ten minutes before the signups closed. Characteristically, we proceeded to put everything off until the last minute, and spent 3 sleepless weeks preparing our Cyber Reasoning System for the contest. Our efforts paid off and, as we talked about last DEF CON , against all expectations, we qualified and became one of the 7 finalist teams. The finals of the CGC will be held the day before DEF CON. If we win, this talk will be about how we won, or, in the overwhelmingly likely scenario of something going horribly wrong, this talk will be about butterflies. In all seriousness, we’ve spent the last year working hard on building a really kickass Cyber Reasoning System, and there are tons of interesting aspects of it that we will talk about. Much of the process of building the CRS involved inventing new approaches to automated program analysis, exploitation, and patching. We’ll talk about those, and try to convey how hackers new to the field can make their own innovations. Other aspects of the CRS involved extreme amounts of engineering efforts to make sure that the system optimally used its computing power and was properly fault-tolerant. We’ll talk about how automated hacking systems should be built to best handle this. Critically, our CRS needed to be able to adapt to the strategies of the systems fielded by the other competitors. We’ll talk about the AI that we built to strategize throughout the game and decide what actions should be taken. At the end of this talk, you will know how to go about building your own autonomous hacking system! Or you might know a lot about butterflies.
AI Generated Summary
The presentation detailed the Shellphish team’s participation in the DARPA Cyber Grand Challenge (CGC), a competition focused on developing automated cyber reasoning systems (CRS) for real-time vulnerability discovery, exploitation, and patching of unknown binaries. The core of their approach combined coverage-guided fuzzing with symbolic execution. Their primary tool, Driller, integrated the AFL fuzzer with the Angr symbolic execution engine. AFL generated initial inputs for broad code coverage, while Angr performed deep path exploration to find inputs triggering specific, hard-to-reach program states, with Z3 used for constraint solving.
For automatic exploitation, the team’s Rex component used symbolic execution to identify crashes where the program counter (PC) was controllable. It then synthesized inputs to place shellcode at a controlled location and redirect execution. A separate component handled type-two exploits, which involved leaking sensitive memory from a designated flag page. The system’s patching mechanism aimed to neutralize vulnerabilities while preserving the binary’s original functionality, as required by the CGC’s scoring. The scoring system multiplied scores for availability (performance overhead), security (residual exploitability