Hackers of India

Using Data Analytics for Incident Response

 Samir Saklikar 

2011/09/06

Abstract

Critical Incident Response teams are tasked with quickly responding to any known attacks, which may be occurring in the company at a given time. As a result, most of their efforts are geared towards identifying those attacks, which are causing a lot of activity within their network and stopping them immediately. They have the assistance of various Security Incident Event Management (SIEM) and Full Packet capture tools in dealing with such real-time attacks. However, another significant challenge for CIRT teams is also to quickly identify the source of infection, once they learn from external sources that a confidential document has been compromised from the company. While they are working against the clock to identify and plug the infection vector, they are bogged down by multiple complexities. These include dealing with large amounts of varied types of security data (e.g. SIEM & Packet capture data, configuration data etc) as well as the increasingly low-and-slow nature of newer attacks based on the Advanced Persistent Threat model, wherein the attack activity is well distributed across time (i.e. conducted in smaller un-noticeable increments) and space (i.e. across different endpoints), not to show up high on the radar of the deployed SIEM or full packet capture based monitoring of the CIRT.

In this session, we discuss how Data analytic techniques can be leveraged for enabling a Faster Incident Response handling structure. Herein, the security event data is either stored across distributed data clusters in an unstructured format such as that used in Hadoop, or can be stored within parallel structured databases such as Greenplum. As a technology demonstration, we choose the Greenplum community edition infrastructure, due to its support for relational databases, which is typically used within by existing CIRT infrastructures. The proposal hinges on the knowledge of a compromised artifact from some external sources (say, an underground forum discussing the leak from the enterprise) and using that information to track the version, timestamp of the leaked artifact. Further, therein we discuss how multistep temporal data correlation can be used, starting the analysis from the endpoints/users that have accessed the leaked version, to identifying the various servers which those endpoints have accessed within varying time-intervals, and further whether those respective servers have shown any unresolved SIEM activity which may be suspicious. Eventually, correlating such reduced SIEM activity data with information from configuration systems about any potentially exploitable vulnerabilities in the respective web servers or User activity anomalies, helps in reducing the set of suspicious endpoint activity to a small manageable set, which can be manually investigated by the CIRT team.