Presentation Material
Abstract
In recent years, there has been a significant increase in the occurrence of technically sophisticated Advanced Persistent Threats (APTs). These threats have notably impacted various sectors, including industry, governance, and democracy. Security researchers are overwhelmed by the volume and complexity of this diverse threat landscape. Thus far, researchers have primarily relied on manual analysis to study various types of malicious files and discern distinct techniques, custom tools, and behavioral patterns employed by these APTs. For instance, after the SolarWinds breach in December 2020, cybersecurity experts attempted to attribute the attack to its originators. It wasn’t until May 2022 that FireEye found similarities between the SolarWinds malware and the Russia-linked cyberespionage group Turla (APT29), which connected the two.
In this presentation, we explore the challenges of attributing APTs in real-world scenarios. Through case studies, we emphasize how APT groups adapt campaigns based on their objectives, share tooling, and utilize diverse files and platforms. This adaptability and evolution often result in inconsistent or inaccurate attribution claims. To address this, we propose a two-tiered approach to attribution, i.e., at the APT campaign and APT group levels. We present ADAPT, a machine-learning-based pipeline that automates attribution across diverse malicious file types (executables and documents). We apply ADAPT to a newly curated APT dataset comprising 6,134 real-world APT samples from May 2006 to March 2023. We employ a standardization process to ensure consistency in group names and identify 92 unique APT groups. ADAPT utilizes an unsupervised clustering algorithm and effectively identifies samples with similar objectives and those associated with the same APT groups. Finally, through qualitative case studies on APT29, APT32, APT42, and Sidewinder, we demonstrate how our categorization enables the classification of unknown threat campaigns and their associated threat groups, significantly reducing the need for manual analysis.
AI Generated Summarymay contain errors
Here is a summary of the discussion:
The speaker presented research on malware detection using multiclass classification, (i.e., assigning multiple labels to a sample). They mentioned that in their approach, they want to provide weightage to each label based on clustering, A questioner asked about incorporating dynamic analysis into their approach, , especially for cases like HTML smuggling where samples need to be executed to reveal their behavior. The speaker acknowledged the limitation of their static analysis-based approach and agreed that dynamic analysis is necessary for some samples, but it also comes with its own limitations (e.g., ensuring the right system configuration to elicit the sample’s true behavior).
Another questioner asked about handling packed samples, which are not currently unpacked in their pipeline. The speaker mentioned that around 10% of their samples were packed using commercial packers, and they treat packing mechanisms as a feature. They do not discard packed samples but rather send them through their pipeline, where they may end up in uni-clusters or clusters with similar properties.
Overall, the discussion revolved around the limitations of the speaker’s approach, particularly regarding dynamic analysis and handling packed samples, and potential future directions to address these limitations.