Local LLMs in Action: Automating CTI to Connect the Dots

#threat-intelligence-analysis #machine-learning #threat-detection #ai-security

Focus Areas: 🛡️ Security Operations & Defense , 🤖 AI & ML Security , 🕵️ Threat Intelligence

Presentation Material

Abstract

With the rising volume of cyber threats, traditional CTI workflows often struggle to map threats efficiently. This session explores how local language models (LLMs) can automate critical CTI processes, extract intel in real-time and visualize them based on targeted industry by APTs and plot a timeline threat activity graph for known malware strains.

Using Python-based automation and local LLMs, attendees will learn how to:

Query and Process Reports: Automatically download, normalize, and chunk data from publicly available sources.
Map Threats to MITRE: Extract TTPs, IOCs, and other insights to map them to MITRE ATT&CK and identifying gaps in existing SOC/MDR detections.
Attribute Threats: Use sandbox APIs and threat intelligence services to classify malware families and identify threat actors.
Visualize Data: Transform extracted intelligence into knowledge graphs or operational dashboards to aid SOC decision-making.
Automate Workflows: Implement periodic updates and scalable pipelines to ensure continuous threat intelligence processing.

By the session’s end, participants will have actionable strategies to implement local LLMs for CTI and improve their organization’s cyber defenses.

CONFidence 2025, 2 June 2025, 15:15–16:00, Kraków.

AI Generated Summary

The talk addresses the challenge of cyber threat intelligence (CTI) teams being overwhelmed by the volume of incoming indicators of compromise (IOCs) from sources like threat feeds, blogs, and reports. Manual extraction, correlation, and attribution of this data is inefficient and leads to poor prioritization of intelligence relevant to a specific industry.

A solution presented is the use of local large language models (LLMs) to automate this process. Local LLMs offer advantages over cloud-based APIs, including data privacy, avoidance of vendor lock-in, and the ability to fine-tune models on industry-specific threat data for more relevant analysis. The core of the approach is an end-to-end pipeline: intelligence sources are scraped or ingested, then normalized and chunked using methods like recursive or semantic chunking to prepare text for the model. The content is embedded to create searchable vectors and cached for efficient retrieval. A local LLM (e.g., Mistral run via Olama) is then queried with natural language prompts to extract structured information such as IOCs, tactics, techniques, and procedures (TTPs), and map them to frameworks like MITRE ATT&CK. The extracted data is fed into visualization tools like Splunk dashboards to plot threat actor activity timelines, initial access methods, and industry-specific targeting, enabling analysts to identify coverage gaps.

Key pitfalls of LLMs are acknowledged, including hallucinations, bias in training data, and resource constraints. Mitigations involve using highly specific prompts, implementing validation scripts, curating high-quality trusted data sources, and employing quantized models (GGUF format) for CPU-only environments. The practical implication is a system that automates the conversion of unstructured open-source intelligence into actionable, visualized threat landscapes tailored to an organization’s environment, enhancing existing CTI workflows and detection engineering. The effectiveness depends heavily on the quality and specificity of the source data and the fine-tuning of the model for the particular use case.

Disclaimer: This summary was auto-generated from the video transcript using AI and may contain inaccuracies. It is intended as a quick overview — always refer to the original talk for authoritative content. Learn more about our AI experiments.