Hackers of India

Fuzzing with complexities

 Vishwas Sharma 


Presentation Material


Fuzzing is a black box software testing technique, which basically consists in finding implementations bugs using malformed/semi-malformed data injection in an automated fashion [1]. The aim here is to generate these malformed data in the most efficient way and also to monitor the program more accurately. Depending upon the type of strategy adopted to generate data we have largely two types of fuzzers. First is the mutation based data generation technique in which mutation starts out on a known good ‘template’ which is then modified. However, nothing that that is not present in the ‘template’ or ‘seed’ will be produced [2]. Then comes the Generation based data generation technique. In this setup, Fuzzers are capable of building the data being send based on a data model provided. Now the these generation based systems can either be dumb, thus producing random stream of data or can be smart as they can modify themselves and the data generation process based on observations [2].

Mutation based data generation does depend upon availability of file with all specs desired to be fuzzed, this limits the possibility of having complete through and through analysis of program. As it has been talked about in ‘’Analysis of mutation and generation based fuzzing”. Commonly Mutation based strategy is adopted because of its ease to start with and monitor. For Generation based strategy one has to have complete understanding of all the protocols/binary relations associated with a network/file format and when analyzing complex file or network formats it become a unreasonably complex task. In this paper I will focus on one such file format i.e. PDF file format. This file format is a hugely complex one and it one of the most updated and feature intensive. At this point I will briefly explain the PDF file format [4] and a few 3rd party libraries that we can trust to produce i.e. iText, ReportLab, PyPdf and more. Another important library for PDF is written by Felipe[5] which gives us flexibility to create PDF files with complete control on the content.

In the past years many vulnerabilities in the PDF format has captured the headlines for all sort of wrong reasons. We all have seen that PDF files are being widely used to exploit in the wild. Looking at the vulnerabilities that have been exploited its easy to point out a few common areas where vulnerabilities exists. The most common comes out to be 3rd party file formats that are included inside the PDF like that of U3D, TTF, GIF and more. These 3rd party formats are making their way into PDF in a big way and since they are being targeted most I would present ways in which we can test them to verify their parsing mechanism inside reader. Several attempts have been made to write a good fuzzer for PDF which include JavaScript fuzzer called spider Pig[6], a famous 5 line fuzzer by charlie miller. All of these works but have their limitation on the methodology they are using. Spider Pig can only fuzz the known JavaScript, does not carry forward relations. Charlie Miller fuzzer is throwing up results because of the huge db of samples he collects. My fuzzing strategy is based on the fact that we have lot of file format specification that can be embedded into PDF by using 3rd party libraries. These libraries, a few of them I have already talked about, can be used irrespective of the programming language they support. I will use peach-fuzzer to generate fuzzed 3rd party file format (for demo TTF and ColorProfile) and these file will be embedded into PDF(report lab / iText). Thus creating a new PDF File each time with fuzzed data. Once I have these PDF files I will run each of these PDF file as they create into a testing environment where it can be tested. Will show the test result that I have found regarding the vulnerability and explore the possibilities and chances of finding bugs through this process.