Hackers of India

phoneypdf: A Virtual PDF Analysis Framework

By  Kiran Bandla  on 14 Feb 2014 @ Nullcon

This talk covers following tools where the speaker has contributed or authored
PHONEYPDF

Presentation Material

Abstract

PDF exploitation is never complete without JavaScript. Most PDF exploits that we come across are based on JavaScript. The attackers use JavaScript for various reasons - to obfuscate the payload, the shellcode or many other things. However, there are not many tools that have the capability to automatically analyze the JavaScript in a dynamic way.

This paper presents tools and techniques to analyze malicious PDF files. We also present phoneypdf, an open-source PDF analysis framework. The paper builds on existing work and presents some new work which allows us to leverage the Adobe PDF DOM and XFA. Emulating the Adobe PDF DOM gives us unique advantage over other tools that are currently available. It gives us a fine grained information on the PDF’s layout, XFA and execution of JavaScript. Having the Adobe DOM gives us the ability to get deeper insights into exploitation than just pure static analysis.

As an example, we analyze CVE-2010-0188 and how it is detected by phoneypdf. An analyst can quickly extend phoneypdf by way of signatures or code to add detecting new exploits. We discuss the technical challenges and related solutions PDF analysis in a semi-dynamic way.

AI Generated Summarymay contain errors

Here is a summary of the content:

The speaker is discussing PhonyPDF, an open-source tool used to analyze and extract information from PDF files. The tool is particularly effective in detecting exploits, such as CVE-2010-0188, which is one of the most commonly abused vulnerabilities in PDFs.

PhonyPDF works by walking through each object in a PDF file, following references between objects, and analyzing their contents using dynamic handlers based on the type of object. There are six basic types of objects in PDFs: comments, cross-reference, trailer, start of cross-reference table, indirect objects, and malformed objects. The tool has built-in support for around 35 types of objects.

The speaker shows an example from PhonyPDF’s code, demonstrating how it handles a specific type of object (action) and extracts JavaScript code if present. They also mention that the tool can be extended to detect any exploit by hooking into the right places in the code.

PhonyPDF is released under a permissive open-source license, requires five dependencies to build and use, and can be used as a standalone tool or built upon to create libraries. The speaker hopes that releasing the tool as open source will encourage more people to contribute to its development and improve its capabilities.