Hackers of India

ClusterFuzz: Fuzzing at Google Scale

By  Abhishek Arya  , Oliver Chang  on 04 Dec 2019 @ Blackhat

This talk covers following tools where the speaker has contributed or authored
CLUSTERFUZZ

Presentation Material

Abstract

Fuzzing is an effective way of finding security vulnerabilities, but it does not scale well for a defender trying to protect a complex software with several third-party dependencies. There are numerous daunting challenges that come into play which include writing the fuzz targets manually, determining tools and technologies to integrate with, managing continuous fuzzing of these targets at scale, precise crash deduplication, and finally getting the vulnerabilities fixed.

This talk is about how we overcame these challenges to operate the largest publicly known fuzzing infrastructure, running over 25,000 cores, 2,500 targets and find over 8,000 security vulnerabilities in several Google products and 200 open source projects (as part of the free OSS-Fuzz service).

We will dive deeper into how our infrastructure ClusterFuzz completely automates the entire fuzzing lifecycle and how we scale the process of writing fuzz targets into developer workflows. Our experience highlights that these methodologies scale well for both large projects (like Chrome) and small projects (like openssl, libxml, and many other OSS-Fuzz projects).

AI Generated Summarymay contain errors

Here is a summarized version of the content:

Expert Content Summarizer

The conversation revolves around fuzzing, as a testing technique to identify security vulnerabilities in software. The speaker explains how they use machine learning models like RNN to improve the corpus of test cases, making them more valid and effective.

Corpus Sharing Rate

The team currently uses a randomized approach to share corpuses between different fuzzers, but they are working on developing a more intelligent method that understands the format of the corpus and can intelligently pollinate it with other corpuses.

Machine Learning in Fuzzing

Machine learning models like RNN have shown promising results in producing fuzzing data, especially for text-based formats. The team has integrated this approach into their cluster fuzzer.

Hybrid Fuzzing

The speaker is skeptical about combining different techniques like symbolic execution or constraint solving with fuzzing, citing limited success in real-world programs.

Stack Corruption

When dealing with corrupted stacks that cannot be recovered, the team uses alternative deduplication methods like library names or target names. However, this is not a foolproof solution and instrumentation issues need to be addressed.

Selling Fuzzing to Dev Teams

The key to convincing developers to adopt fuzzing is to make it simple to integrate into their workflow. By supporting memory sanitizer tools and making fuzzer writing as easy as unit testing, the team has been able to demonstrate the value of fuzzing in catching security vulnerabilities and user stability crashes.