Protecting the Protector, Hardening Machine Learning Defenses Against Adversarial Attacks

By Jugal Parikh , Randy Treit , Holly Stewart on 09 Aug 2018 @ Blackhat
📊 Presentation 📹 Video 🔗 Link
#blueteam #ai-security #machine-learning #ml #ai #deep-learning #security-analytics
Focus Areas: 🛡️ Security Operations & Defense , 🤖 AI & ML Security

Presentation Material

Abstract

Humans are susceptible to social engineering. Machines are susceptible to tampering. Machine learning is vulnerable to adversarial attacks. Researchers have been able to successfully attack deep learning models used to classify malware to completely change their predictions by only accessing the output label of the model for the input samples fed by the attacker. Moreover, we’ve also seen attackers attempting to poison our training data for ML models by sending fake telemetry and trying to fool the classifier into believing that a given set of malware samples are actually benign. How do we detect and protect against such attacks? Is there a way we can make our models more robust to future attacks?

We’ll discuss several strategies to make machine learning models more tamper resilient. We’ll compare the difficulty of tampering with cloud-based models and client-based models. We’ll discuss research that shows how singular models are susceptible to tampering, and some techniques, like stacked ensemble models, can be used to make them more resilient. We also talk about the importance of diversity in base ML models and technical details on how they can be optimized to handle different threat scenarios. Lastly, we’ll describe suspected tampering activity we’ve witnessed using protection telemetry from over half a billion computers, and whether our mitigations worked.

AI Generated Summary

The talk addresses adversarial machine learning in cybersecurity, focusing on protecting against novel malware attacks where response time is critical. It outlines how attackers can tamper with supervised machine learning models used for malware detection through methods such as data poisoning, feature evasion, and model stealing, often requiring insider knowledge of the system. Real-world examples include attacks that manipulated antivirus automation by injecting malicious content into clean files and flooding cloud-based reputation systems with spoofed traffic to degrade classification performance.

To counter these threats, the presentation details the implementation of a stacking ensemble approach. This method combines a diverse set of base classifiers—varying in feature sets (static, dynamic, contextual), algorithms (linear models, boosted trees, neural networks), and training data—to improve robustness. Logistic stacking, using model probabilities as inputs to a LightGBM meta-classifier, was found superior to boolean stacking. Key technical refinements included filtering features correlated with labels to prevent data leakage, adding boolean indicators for missing classifier outputs, and incorporating unsupervised clustering features. The ensemble demonstrated resilience in testing, maintaining performance even when some base classifiers were compromised or provided noisy inputs.

Practical implications emphasize that model diversity and continuous feature updates are essential against rapidly evolving, highly polymorphic threats (96% of malware seen only once). Deployment requires careful handling of volatile telemetry, outlier removal during training, and monitoring for adversarial activity. The ensemble approach provides a scalable framework for client and cloud-based detection within milliseconds, though it necessitates ongoing evaluation to ensure features remain relevant and uncorrelated with future labels.

Disclaimer: This summary was auto-generated from the video transcript using AI and may contain inaccuracies. It is intended as a quick overview — always refer to the original talk for authoritative content. Learn more about our AI experiments.