ReconPal: Leveraging NLP for Infosec

By Nishant Sharma , Jeswin Mathai on 07 Oct 2020 @ Rootcon
πŸ’» Source Code πŸ“Ή Video πŸ”— Link
#reconnaissance #red-teaming #nlp #ai
Focus Areas: πŸ€– AI & ML Security , 🎯 Penetration Testing
This talk covers following tools where the speaker has contributed or authored
RECONPAL

Presentation Material

Abstract

Recon is one of the most important phases that seem easy but takes a lot of effort and skill to do right. One needs to know about the right tools, correct queries/syntax, run those queries, correlate the information, and sanitize the output. All of this might be easy for a seasoned infosec/recon professional to do but for rest, it is still near to magic. How cool it will be to ask a simple question like β€œFind me an open Memcached in Manila with UDP support?” or β€œHow many IP cameras in Phillippines are using default credentials?” in WhatsApp chat or a web portal and get the answer?

The integration of GPT-3, deep learning-based language models to produce human-like text, with well-known recon tools like Shodan can allow us to do the same. In this talk, we will cover how such integration can be done with Shodan and other recon tools. And, how this functionality can be extended to cover other popular tools. The code will be open-source and made available after the talk.

AI Generated Summary

ReconPal is a framework that leverages the GPT-3 language model to automate penetration testing tasks by interpreting natural language commands and orchestrating security tools. The system aims to lower the barrier to entry for security testing by allowing users to describe objectives in plain text, such as “find cameras in the Philippines” or “scan all machines from previous result,” which the tool translates into specific operations.

The architecture integrates three primary modules, each containerized with Docker for isolation and easy updates. The finder module uses the Shodan search engine API to locate vulnerable internet-facing devices (e.g., exposed Docker hosts, Memcached servers). The scanner module executes network scans using tools like Nmap, dynamically selecting options based on the interpreted command. The attacker module performs exploitation tasks, such as dictionary attacks with Hydra or vulnerability scanning with Nikto and sqlmap. A controller component mediates between the user’s Telegram bot interface, the GPT-3 API for intent parsing, and the modular containers.

Key demonstrations included discovering thousands of exposed devices, scanning a private vulnerable host to identify services like vsftpd, and successfully cracking a default password. The system can also chain operations, using previous results for subsequent actions. A manual command mode (using a ‘>’ prefix) bypasses GPT-3 for direct tool execution.

Practical implications center on automating reconnaissance and initial exploitation, potentially accelerating security assessments. However, deployment is hindered by dependencies on paid, rate-limited APIs (GPT-3’s beta access and Shodan’s commercial tiers). Future development plans include audio input, offloading command formulation directly to the language model by feeding it tool documentation, generating output summaries, and adding interactive shell sessions. The project highlights both the utility and current limitations of large language models in security automation, while proposing a containerized design for extensibility.

Disclaimer: This summary was auto-generated from the video transcript using AI and may contain inaccuracies. It is intended as a quick overview β€” always refer to the original talk for authoritative content. Learn more about our AI experiments.