How my SV Machine nailed your Malware

By Nikhil Prabhakar on 24 Jun 2017 @ Lehack
📹 Video 🔗 Link
#android-security #machine-learning #malware-detection
Focus Areas: 🛡️ Security Operations & Defense , 🤖 AI & ML Security , 🦠 Malware Analysis , 📱 Mobile Security

Presentation Material

Abstract

Android applications being used in the industry from a security perspective, it is well known that the Android platform is susceptible to malicious applications. With the recent trend where all vendors and customers are going completely mobile, Android has become a major attack surface. The mechanisms used for Android malware detection comprise several known methods, most of which are permission-based or based on API usage. However, these mechanisms are vulnerable to instruction-level obfuscation techniques.

This talk introduces a machine learning approach to Android malware analysis using functional call graphs and the Hash Graph Kernel (Hido & Kashima) method to find similarities among binaries while being resistant to obfuscation. The implementation uses the Support Vector Machine (SVM) algorithm for Android malware classification, embedding functional call graphs along the feature map. The approach achieves better detection rates with minimal false positives compared to other methods. Using clean and real malware Android application samples, a classification model is developed where functional call graphs are extracted, linear-time graph kernel based explicit mapping is deployed, and the SVM algorithm is trained to differentiate between legitimate and malicious applications.

AI Generated Summary

The talk presents a project named FLAME, an open-source framework for Android malware detection using machine learning. The core approach involves converting Android application packages (APKs) into functional call graphs (FCGs), where nodes represent functions/APIs and edges represent calls. These graphs are transformed into feature vectors using graph kernels, specifically a neighborhood hash kernel (Hedo & Kashima method), to compute similarity metrics. These metrics are then fed into a support vector machine (SVM) classifier to distinguish malicious from benign applications.

Initial experiments with the Weisfeiler-Lehman graph kernel failed due to issues like diagonal dominance and exponential feature space growth, which reduced similarity detection. The adopted hash-based kernel addressed these by hashing node labels and their neighbors, enabling efficient comparison of graph structures. The system was trained on a dataset of 900 malicious and 900 benign APKs, achieving 78% accuracy with a 3% false positive rate. However, the process demanded substantial computational resources, requiring cloud instances (AWS C4.xlarge) for feasible execution.

Key findings indicate that machine learning can serve as a complementary tool to dynamic analysis for Android malware, but success heavily depends on selecting an appropriate feature space—here, the FCG structure. The high computational cost and the need for large, quality training datasets are significant practical constraints. The framework, while functional, requires further optimization for confidence scoring and efficiency. The project underscores the viability of static graph-based ML for malware classification but highlights ongoing challenges in scalability and real-world deployment.

Disclaimer: This summary was auto-generated from the video transcript using AI and may contain inaccuracies. It is intended as a quick overview — always refer to the original talk for authoritative content. Learn more about our AI experiments.